1. Introduction
Ocean waves worldwide are of significant interest due to their great impact on ocean engineering, resource development, maritime transportation, and fishing. Due to the enhanced development of numerical models, a precise description of sea state events at various scales, from the global to the coastal, is now possible [
1].
As some authors have demonstrated [
2,
3,
4], the quality of wind forcing has a significant impact on wave prediction, suggesting that poor wind data might cause inconsistent wave field predictions.
Moored buoys are one of the most important sources of data sets to perform the assimilations because of their continuous records in time at a given location. However, the limitation is that most buoys are often in coastal waters or adjacent offshore areas. The satellite altimeter can offer global long-term observational time series to cover ocean areas around the globe. However, the limitation is that the repeat cycle of the altimeter at one specific location can be too large. Altimeter data is usually calibrated with buoy measurements before being used.
A variety of methods have been used for data assimilation. Sequential or variational assimilation approaches are the most common types utilised in wave forecast assimilation. Sequential techniques are mainly used for operational forecasts according to step-by-step methodologies. The background condition and the observed time period are updated at each step, and the current state can be used as the starting point for the following prediction step [
5,
6,
7,
8]. The most effective interpolation methods have been applied to the global and local scales to include wind and wave altimeter data in wave models [
9,
10] due to their ease of use and reasonable computational cost.
Variational methods involve higher computational costs than sequential methods because they require more repetitions, so they are usually found in large weather investigation centres [
11]. The methods can assimilate observations made on a vast spatial area and are the chosen option in global wave forecasts. However, very often, the interest is only in the improvements of the forecasts in a very limited area in which less demanding computational methods can be adopted, such as the methods developed for local studies [
12,
13] and pitch and roll buoys [
14,
15,
16].
In [
17], the study focuses on integrating significant wave height (SWH) data collected from distributed ocean wave sensors into ocean models. The paper demonstrates that assimilating SWH data from distributed ocean wave sensors into ocean models leads to substantial improvements in the accuracy of wave height predictions, with a 27% reduction in the significant wave heights RMSE.
In [
18] an assimilation scheme on a regional wave forecasting system was used. The study specifically focuses on the assimilation of Significant Wave Height measurements from the SARAL/AltiKa, Jason-2, and Jason-3 altimeters using the Optimal Interpolation technique. The assimilation of altimeter data significantly improved wave predictions, especially for swell forecasts, with better results in the northern Indian Ocean. SWH prediction improved ~15% in the initial 24 h, the mean wave period improved ~5–9% during the first 5 days forecast and the peak wave period showed an improvement up to ~11% till 78 h forecast.
In [
19], significant wave height data from the HY-2B satellite are assimilated into the WAVEWATCH III model using the Ensemble Optimal Interpolation scheme. The study examines the impact of various ensemble formulations on wave forecast accuracy over the China Seas. The results show that the assimilation significantly improves wave forecasting in nearshore regions more than in offshore areas. According to the study, improving forecast accuracy may be achieved by carefully adjusting ensemble configurations and by increasing the number of satellite observations.
In [
20], a study explores the efficiency of assimilating wave spectrum data compared to traditional significant wave height assimilation methods for improving wave simulations. The research highlights the possible advantages of incorporating wave spectrum data into global wave models, which can improve the accuracy of ocean wave simulations in various scenarios.
A local assimilation method to improve the wave forecasts in the area a ship is navigating was shown to be effective [
21]. The analysis of ship motions determines the directional spectrum of the wave system, exciting the motions and thus providing the measured data assimilated in the wave forecast. The method for estimating wave spectrum is the same method used to estimate waves based on the movements of ships measured [
22,
23] from the movements of wave buoys. These can be used to construct decision support systems [
24], giving ships access to current information on the wave conditions along their path. In that study, three different assimilation methods were considered. Also, this work suggested that the Kalman filter was an efficient method to improve the accuracy of the predictions, with a bias reduction from −0.40 to −0.18 and a correlation coefficient increasing from 0.84 to 0.96 after the assimilation.
Using the Local Ensemble Transform Kalman Filter (LETKF) assimilation method ensemble-based approach [
25], recently integrated satellite altimeter measured SWH along with Sofar Spotter buoys, which are buoys that collect and transmit ocean data in real-time [
26]. These studies show how numerical wave models can benefit from including buoy and altimeter information, improving their ability to anticipate waves. The LETKF method provides more physically realistic model state updates, resulting in a reduction of forecast errors by up to 2.5 days.
Models and measurements must be combined to improve prediction ability. In the present study, the focus is on improving the skill of a wave model applied in large ocean areas that cannot be characterised by one-point measurement. Therefore, instead of buoy data, the approach uses data from satellite altimeter products covering the broader area of interest.
The Azores’ maritime routes are essential to the world’s exploration, trade, military strategy and cultural exchange. Its location is still of strategic importance in the context of modern transport and communications networks. The wave conditions in the Azores are influenced by their central Atlantic location and are affected by significant seasonal and regional changes. Summer offers calmer waters, but winter is especially dangerous in the north and west of the coasts. The characteristics and modelling of wind waves around islands can be significant due to strong diffraction, shoaling and reflection. Also, the sheltering effect of the islands has demonstrated a reduction of the significant wave height between the north and south coasts of the islands.
Hindcasts have biases and uncertainties due to model errors, initial conditions, and parameterisation issues. Without hindcast correction, systematic biases persist, leading to inaccurate forecasts and climate statistics. The Ensemble Kalman Filter (EnKF) can assimilate satellite observations and dynamically correct these errors, improving model predictions and future forecast reliability. A better hindcast leads to improved background error covariance in EnKF, enhancing real-time forecast assimilation, which provides more realistic error estimates for model adjustments. EnKF is a powerful tool to improve weather forecasting, hurricane tracking, and ocean monitoring in the Azores area. Its ability to effectively assimilate various data sources is an asset for scientific research and practical applications in this dynamic region. This study uses satellite observations to apply the EnKF to correct a wave hindcast. The goal is to improve the wave height predictions by assimilating satellite data in space and time around the Azores archipelago at a low computational cost.
This paper is divided into four major sections and a conclusion. After this introductory section, the methodology is presented in
Section 2. This is followed by the correction results in
Section 3 and the discussion of the obtained results in
Section 4. Finally, the last chapter summarises the work’s outcomes.
2. Methodology
2.1. Wave Model Physics
WAVEWATCH 3 (WWIII) [
27] is a third-generation spectral model with the capacity to resolve the nonlinear exchanges by applying the Eulerian method to estimate the wave spectrum equation:
where
S describes the source terms,
N the action density spectrum and
σ is the relative frequency.
The model’s kinematics are expressed on the right part of the equation, while the physical processes that generate, disperse, and transmit wave energy are defined on the left side. In these words, nonlinear factors, including bottom friction and white capping, quadruplet wave-wave interactions, and wind-wave interactions, are considered.
SWAN [
28] is a third-generation wave model that parameterises nonlinear processes to solve the action balance equation. This model is well suited for coastal water processes because it incorporates additional source components, such as depth-induced wave breaking, triad wave-wave interactions, and the JONSWAP parameterisation for dissipation owing to bottom friction [
29]. The model also considers the diffraction effects [
30].
Both models solve the equation for the spectral energy balance:
The left hand of the equation shows the kinematic part, where the first term represents the effect of time on action density, and the second term represents the propagation of wave energy in two-dimensional space () with group velocity (). The third term represents the effect of shifting of the radian frequency due to variations in depth and mean currents. Finally, the fourth term represents depth-induced and current-induced refraction. The right hand of the equation represents the sink and source terms, which represent all physical processes that generate, dissipate, or redistribute wave energy.
In deep waters, the following terms can be considered for S: Sin is a wind-wave interaction, Snl is a nonlinear wave-wave interaction, and Sds is the wave dissipation term.
Six processes contribute to
S in shallow waters, such as wave growth by the wind (
Sin), the nonlinear transfer of wave energy through three-wave (
Snl3) and four-wave interactions (
Sln4), and wave decay due to white capping (
Sds,w), bottom friction (
Sds,b), and depth-induced wave breaking (
Sds,br).
2.2. Models’ Setup
WWIII was implemented in an earlier study [
21] to generate waves in the North Atlantic (
Figure 1a). With 32 frequencies and 24 directions (
Table 1), the model was implemented with wind and sea-ice cover from ERA5. The bathymetry resolution was 0.5° by 0.5° for the North Atlantic basin. The ERA5 wind input fields, given by the European Centre for Medium-range Weather Forecast (ECMWF) [
31], were used to drive the model, with a time resolution of 3 h and a spatial resolution of 0.5° × 0.5°. All parameters are described in
Table 1.
For this study the wave model SWAN (
Figure 1b) was set for the spatial domain of the Azores area. A nonstationary system is considered with a 20 min time step and 4 iterations per time step, and a 99% of convergence criteria. The wave spectrum is discretised in the spectral space with 29 frequencies (0.0418 Hz to 0.6 Hz) and 36 directions (
Table 2). Regarding the physical parameterisation setup, the most important processes are the Janssen wind generation [
32], white capping dissipation and quadruplet nonlinear interactions. Wave boundary conditions for SWAN were provided from the large-scale model WWIII in terms of files of parameterised wave spectra. The advantage of using SWAN for intermediate/shallow waters is that it uses implicit numerical schemes that are more robust and economical than explicit schemes such as those used by WWIII.
The GRIDGEN software program [
33] creates the bathymetry and was designed to be used with MATLAB R2024a for grid production. The system uses a bathymetry with 0.05° × 0.05° resolution.
2.3. Hindcast System and Validation
The hindcast system runs between 1 January 2010, and 31 December 2015, and is validated with wave data from 3 wave buoys (
Table 3) provided by the University of Azores, under the project CLIMAAT [
34] and supported by ECOMARPOT (
https://ecomarport.eu/, accessed on 28 April 2022). The data from these wave buoys is freely available through the European Marine Observation and Data Network (EMODnet,
https://emodnet.ec.europa.eu, accessed on 28 April 2022). In general, long term wave buoy measurements show some missing data, due to rough wave conditions during major storms, ship collisions or buoy maintenance periods. For this study, a comparison was made between the buoy and the model for each available time step. No interpolation is made.
This hindcast system is based on a prior study that implemented WWIII model, version 5.16, for the North Atlantic basin [
21] which provides boundary conditions for the SWAN model applied it the area of Azores (
Figure 1b). The wind forcing is from the ERA5 database [
35] given by the European Centre for Medium-range Weather Forecast (ECMWF), with a time resolution of 3 h and a spatial resolution of 0.25° × 0.25° Azores coast. More details are presented in
Table 1 and
Table 2.
To conduct a statistical evaluation, the following statistical metrics are used: mean error (Bias), scatter index (SI), root mean square error (RMSE), the mean absolute error (MAE), Nash-Sutcliffe efficiency (NSE) and Pearson correlation coefficient (r) [
21].
Table 4 shows the model and data have a weak association, as seen by correlation values less than 0.80 and large RMSE (1.46; 1.10 and 1.05) and MAE (1.21; 0.88; 0.86) and the SI (0.81; 0.66 and 0.77) for buoys B1, B2 and B3, respectively. Looking at the NSE values, and specifically the B1 buoy, which is higher than 1, this is unusual and suggests potential issues with the model or data. The results certainly benefit from corrections, and, in this regard, a correction method will be implemented. The purpose of the work is not to adjust the model itself but to correct numerical results using satellite data.
2.4. Data Correction Techniques
Wave model simulations show errors when compared with measurements. A model is developed here to represent the difference between the numerical predictions and the satellite data, and this model is then applied to correct the numerical wave predictions, bringing them closer to the measured altimeter data. For correction methods, it is fundamental to recognise that the quality of observation data sets influences the quality of corrections. In this study, satellite data are used to improve the quality of the hindcast.
The Along-track Altimeter Significant Wave Heights wave dataset from the GlobWAVE project is used in this investigation (/ifremer/cersat/products/swath/altimeters/waves/data) to correct the system. The altimeter wave data considered extend from 36° N to 41° N latitude and 32° W to 24° W longitude. The altimeter data from seven satellite missions, including ERS-1, ERS-2, ENVISAT, TOPEX/POSEIDON, Jason-1, Jason 2, and GEOSAT FollowON, have been combined as part of the project [
36] to deliver a multi-mission Level 2 post-process (L2P) data of SWH [
37]. Satellite-derived altitude data has gone through post-processing to reduce measurement error [
38,
39]. Comparisons with buoy data showed that the altimeter Hs is generally in accordance with the in-situ data, with standard deviations of differences of the order of 0.30 m [
40,
41] provide more details on these methods.
The GlobWAVE project SWH data points (
Figure 2) have a high resolution in time and space compared to the model results constrained to 0.1° and 6 h. Therefore, interpolating the hindcast to pair the satellite data may induce errors associated with spatiotemporal scales. However, the satellite data is positioned at the same resolution on the hindcast grid.
The Kalman filter is the foundation of the study. It is an approach for more precisely estimating unknown variables using a sequence of observations—including statistical noise and errors—than single measurements, as suggested by [
42]. The Kalman filter’s optimality is predicated on the idea that all measurements and errors have Gaussian distributions. To reduce the average square error, the algorithm provides an estimate of the process value.
The Kalman Ensemble filter (EnKF) was proposed by [
43,
44] to correct statistical model output. The best statistical estimate of the system is obtained using the simple theory technique, which is an easy technique that combines the numerical model returned value with the matching data sets of observations.
EnKF is better able to handle nonlinear dynamics than traditional linear Kalman filters, as it is based on the propagation of multiple state vectors and captures non-Gaussian aspects of the state distribution. It uses Monte Carlo simulation to sample state space, which makes it computationally possible for high-dimensional systems. It can also be applied to problems ranging from simple linear systems to complex highly nonlinear models.
The different correction techniques based on the EnKF formulation can be used depending on the statistical prediction problem in the study. In this work, however, Kalman filters are applied to models to directly correct the delayed forecast.
The Kalman filter-based process here implemented considers the formulations presented below and is applied to the significant wave height:
where
is the analysis,
the SWAN model simulations,
is the observation vector,
is the observation noise,
H is the observation operator that interpolates model states to the observation positions.
is the Kalman gain, given by:
where
is the ensemble covariance and
R defines the observational error covariance, which represents the uncertainties in the observations, and it is typically calculated based on the variance of the observation errors.
The innovation (or Residual) (), which is the difference between the predicted state and the actual measurement, is multiplied by the Kalman gain is added to the value that the model returns at each time step t to correct it.
The Kalman gain adjusts each ensemble member according to the difference between the observation and the model prediction, which is weighted by the uncertainty in the observations and the model. When determining the number of ensemble members, there is a trade-off between computational cost and state estimate quality. The number of ensembles (N) in this investigation is maintained constant at each time step.
For this study, an EnKF is used to correct wave hindcast data using satellite observations. The model initialises an ensemble from the hindcast, incorporates a land-sea mask (to ignore land points in the model.), and defines an observation operator (H) to interpolate model data to observation locations. At each time step, it identifies matching observations, computes the background error covariance (C), and applies the Kalman gain (K) to update the ensemble. An error spread adjustment, λ, is used to maintain ensemble spread and prevent underestimating uncertainty. Finally, the corrected wave field is obtained by averaging the updated ensemble members, ensuring a more accurate representation of significant wave heights.
Three buoys from the Azores (
Table 4) are then used to validate the model before and after the correction (
Figure 3). The statistics were calculated using the three buoys from the Azores to validate the model before and after the correction was made, and scatter graphs comparing the measured values to the simulated values were created.
3. Results
To evaluate the impact of the correction process, the corrected results were compared to buoy data. This study spans 6 years and uses three buoys to validate the correction method, employing the same statistical parameters as before.
Figure 4,
Figure 5 and
Figure 6 compare the observed data’s
Hs Q-Q plots against the predicted data’s quantiles for the hindcast before and after the correction techniques. The blue dots represent the quantiles of the observed data plotted against the quantiles of the predicted data. The red line represents the 1:1 line, where the observed data’s quantiles would perfectly match the predicted data’s quantiles. If the blue dots lie close to the red line, it indicates that the observed and predicted data have similar distributions. As can be observed, before the correction, for all buoys, the points likely deviate more from the diagonal line, especially at the extremes (tails of the distribution), suggesting the model had a bias, particularly for extreme values. After the correction, the points are closer to the diagonal line, indicating a better fit between the simulated and observed data. The outcomes improved following the correction process.
Table 5 demonstrates the statistical outcomes for the hindcast after the EnKF correction method is used to correct the hindcast. The results show that statistical parameters have improved. A lower scatter index indicates a reduced relative error in predictions (from 0.81 to 0.29 for B1, from 0.66 to 0.28 for B2, and from 0.34 to 0.10 for B3) indicating a reduced relative error in predictions. overall errors in predictions have reduced, meaning the model is now more accurate (RMSE from 1.46 to 0.53 for B1; from 1.10 to 0.46 for B2 and from 1.05 to 0.46 for B3; and MAE from 1.21 to 0.41 for B1; from 0.88 to 0.33 for B2 and from 0.86 to 0.32 for B3). There has been an improvement in the model-buoy correlation. (from 0.69 to 0.91 for B1; from 0.65 to 0.88 for B2 and from 0.80 to 0.84 for B3), suggesting that the model is capturing trends more effectively. The NSE, specifically for buoy B1, has dropped from 1.47 to 0.67, now within a reasonable range (0–1), making it more realistic.
Additionally, the correction index is computed. This numerical value indicates how closely a model’s results after data correction (corrected model results) match the original model results (before correction). The percentage decrease in the RMSE is what defines this indicator. The correction index (
CI), which displays the percentage of correction on the RMSE, indicates the skill level of the correction scheme:
RMSEWC stands for the root mean square error (RMSE) of the difference between the simulated and observed wave parameters without correction, while RMSEC represents the root mean square error (RMSE) of the difference between the simulated and observed wave parameters with correction. The correction approach is better when the index value is nearer 100% since it indicates that the estimated wave parameters are more in line with the data. Negative numbers, on the other hand, show that the correction weakens the initial agreement.
Table 6 displays the results for each buoy, together with the
CI and correlation coefficient (both pre- and post-modification). The
CI is 63.7% for buoy B1, 54.55% for B2, and 56.2% for B3. These results show that EnKF is a relatively good correction method. The correlation coefficient shows that using the EnKF correction approach improved the results significantly.
4. Discussion
This work continues a previous study [
21], which enhanced wave forecasting along the ship’s navigation track. By treating the ship as a wave rider, the directional spectrum of the wave system was determined, and the measurement data was assimilated into the wave forecast.
This work aims to improve the hindcast system by combining satellite wave observations to provide a more accurate wave hindcast over a larger geographical area, which cannot be characterized by one single measurement and resorts to satellite measurements spread over the modelling area.
Table 5 displays the statistical outcomes of the hindcast following the implementation of the correction approach. The results show that statistical parameters have improved. The model correlation for all three buoys has improved, indicating a strong correlation between observations and assimilated data.
Figure 3 compares the
Hs QQ diagram of the observed data with the expected information for the previous and post-correction procedures of the hindcast.
The model performance improved significantly after assimilating satellite data, as shown by the lower bias, RMSE, and MAE, along with a stronger correlation. While NSE decreased, it now reflects a more reliable model than an overfitted one. This suggests that satellite data correction improved model generalization and realism.
5. Conclusions
A 6-year hindcast system is used to evaluate how well the Hs hindcasts improve when a correction technique is included in the model simulated output results. The hindcast approach is validated with three buoys located in the Azores area and corrected using satellite significant wave heights wave data points from GlobWAVE project.
A modification is required because the hindcast system’s validation reveals a discrepancy between buoys observations and SWAN simulations. The EnKF method presents good results and can be considered appropriate for reducing general errors. There is a significant decrease in the statistical parameters and an increase in the correlation coefficient. The QQ plot shows that the predicted data follows the observed data’s distribution reasonably closely, indicating that the model effectively predicts values that match the observed distribution. Slight deviations suggest minor areas for potential refinement, but the model appears to be performing well overall.
After the specified correction processes are applied to the hindcast data, the correction is evaluated by assessing the scatter index relating the pre- to the post-corrected data sets. These investigations demonstrate the method’s suitability.
In summary, it is demonstrated that an EnKF improves the accuracy of the wave model predictions and provides a robust framework for integrating satellite observation data into the model results. Moreover, the hindcast correction study is extremely pertinent to real-time forecasting because it increases the precision and dependability of wave predictions. The hindcast is adjusted for historical inaccuracies by including satellite observations. This results in a more accurate starting point for future projections by assisting in the identification and mitigation of systematic biases. The beginning conditions significantly impact the prediction accuracy in real-time forecasting. A more realistic ocean state can be produced by using data correction to historical data, which improves the starting point for real-time forecasts.
EnKF is a potent instrument for the Azores that enhances hurricane tracking, ocean monitoring, weather forecasting, and renewable energy optimization. Because of its capacity to effectively integrate a variety of data sources, it is useful for both practical applications and scientific study in this dynamic region.