3.1. Temporal Analysis Using Buoy Data
The autocorrelation function is an excellent means of analyzing the impact of time lags on correlation coefficients.
Figure 2 displays the autocorrelation curves for each buoy, examining time displacements up to 12 h. The rapid decay in correlation, particularly for U10, suggests that the temporal criteria should be more stringent for U10 than for Hs. This also indicates that metocean prediction is typically more challenging for U10 than for Hs, as there is more short-term variation in wind speed than in wave height. Regarding Hs, correlation values above 0.95 are observed within the first 3 h, whereas for U10, the same value of 0.95 is reached within 1 h, confirming the widely used time resolution of output files from numerical prediction systems, which is typically 1 h for wind speed and 3 h for wave heights. In other words, providing hourly information for forecast users is crucial when dealing with U10, while 3 h would suffice for Hs. Furthermore, correlations for Hs drop below 0.90 only after 6 h, providing valuable information for the validation of wave forecasts; i.e., if the 6 h forecast has a correlation coefficient lower than 0.90, then using the current wave observation as a 6 h forecast would provide better information (in terms of correlation coefficient only).
In
Figure 2, it is interesting to note the differences between the buoys. After a 3 h time lag, the buoys in the Tropical Pacific Ocean (Hawaii) exhibit higher correlations compared to the other stations, while the buoys in the Extra-tropical Pacific Ocean (northwestern coasts of the USA), at higher latitudes, display lower values, particularly for U10. Moreover,
Figure 2 illustrates that the differences among buoys are larger for U10 than for Hs. At a 12 h time lag, the average autocorrelation for U10 drops to 0.60, while for Hs, it is 0.82, demonstrating how wave fields function as low-pass filters of the surface wind fields.
The increase in normalized root mean square differences as a function of the time lag is depicted in
Figure 3, which extends to 24 h. As a normalized metric, it can be interpreted as a percentage measure of RMSD when multiplied by 100. Consistent with
Figure 2, the normalized RMSD shows a more rapid increase for U10 than for Hs, particularly in the first 12 h. When comparing the buoys, the RMSD is lower for the Tropical Pacific Ocean buoys, which aligns with the higher autocorrelations in
Figure 2. On the other hand, for Hs, the RMSD is higher in the Gulf of Mexico than the other clusters. In terms of values, the two plots of
Figure 3 illustrate a fast increase in RMS differences when displacing a few hours, reaching an average of 20% of differences in 3 h for U10 and 7 h for Hs. For 1 h only, it starts with 12% for U10 and 8% for Hs.
Table 2 summarizes the discussion so far, presenting the RMSD, SI, and CC for time lags ranging from 1 to 12 h. For the first hour (1 h time lag), the RMSD of Hs is already very close to the accuracy of the wave buoy, which is equal to 0.2 m according to the NDBC [
38], and it is very similar to the RMSE of calibrated forecast products using WAVEWATCH III, which ranges from 0.2 to 0.5 m according to [
36,
69,
70]. This is also valid for U10, where the 1 h time lag is even higher than the NDBC accuracy of 0.55 m/s.
The effect of time lag on the scatter error is also notable in
Table 2, with values exceeding 10% in 3 h for Hs and only 1 h for U10. The results of
Figure 1 and
Figure 2 and
Table 2 are in agreement with Monaldo [
28], who used one month of buoy observations in November 1985. He found the RMSD reached 0.5 m with an approximate time lag of 4 h, while
Table 2 shows a lag of 6 h on average—the variation among buoys must be considered. The decay of CC values is 0.96 for a 3 h time lag for Hs and 0.87 for U10. This significant impact of time displacement on the statistics indicates rapid changes in wind and wave conditions in a short period of time.
In order to briefly explore the variations in Hs and U10 with time,
Figure 4 and
Figure 5 were generated, including the variance spectrum [
71] and time-series plots, to illustrate some events.
Figure 4 suggests that the main changes in the wind and wave conditions do not necessarily occur in a few hours but beyond 24 h of time, responding to large-scale meteorological systems. The daily cycle is more evident in U10 than Hs, and the most significant modifications in the metocean conditions occur at 48 h and beyond. This finding may contradict the discussion above (
Figure 2 and
Figure 3 and
Table 2); however, despite the great influence of synoptic systems, there is still a secondary high-frequency effect embedded in the variance that can be visualized in
Figure 5. Although the events illustrated last two days or more, the evolution of Hs time shows short rises and falls that can reach more than one meter in one hour, including occasional periods with approximately 15% of hourly variations embedded in the low-frequency component. Since this type of short fluctuation is occasional, apparently random, varies in amplitude, and lacks a constant pattern, it is not highlighted in the variance spectrum.
It is important to note that
Figure 2,
Figure 3 and
Figure 4 provide bulk metrics or average patterns. However, specific conditions and events may cause significant variations in autocorrelation and RMSD, which should be considered when establishing criteria. Scatter plots are a better way to visualize the time lag in this case, and this approach has been extensively explored by many authors in this type of study (e.g., [
53]).
Figure 6 and
Figure 7 show scatter plots of U10 and Hs with time lags ranging from 1 h to 24 h. They clearly demonstrate a much larger spread of U10 compared to Hs. The hot colors in the plots indicate the highest density, but it is also important to analyze the overall distribution of points and the largest differences observed.
The scatter plots of Hs (
Figure 7) show very small differences between time-series with time lags of up to three hours, although the larger waves for the 3 h time lag display some concerning discrepancies. Regarding U10 (
Figure 6), the first plot with a 1 h time lag already exhibits significant scattering. Therefore, a temporal criterion above one hour is not recommended.
In summary, the analyses suggest that temporal criteria of 1 or 2 h would be appropriate for Hs, while 1 h or less would be recommended for U10, although this may vary depending on the location (four locations addressed) and conditions (points of scatter plots). Since it is challenging to define different temporal collocation criteria for Hs and U10 that vary with time and location, a conservative compromise can be achieved by using a maximum of 1 h for the temporal distance between records to be averaged. In practical terms, a limit of 1800 s (plus and minus) centered at the hourly buoy time defines a suitable temporal criterion for altimeter collocation, which is consistent with the fundamental studies of Monaldo [
28] and Ribal and Young [
23].
3.2. Spatial Analysis Using Altimeter Data
The spatial analysis started by applying temporal criteria, which involved selecting altimeter records where the overpass occurred within 30 min of the hourly buoy data. This was followed by the methodology steps outlined in
Section 2.3, where track sections passing very close to the buoys positions were selected. The next figures are based on Hs and U10 AODN altimeter data without any calibration. Later, in
Section 3.3, the calibration proposed by Ribal and Young [
23] is evaluated.
Figure 8 provides a vast amount of information regarding altimeter–altimeter and altimeter–buoy comparisons.
Figure 8A,D show scatter plots of altimeter measurements compared to the single closest altimeter record to the buoy’s position for each satellite track passage—presented as the “expected difference” in the plots, a term used by Monaldo [
28]. The scatter plots show a growing spread as the distance increases, especially beyond 100 km. The density at lower expected difference values is higher at distances between 5 to 50 km. Conversely, small distances also present some points with large differences, while large distances also contain pairs with small differences. However, the general pattern indicates the best agreement within the first 50 km. This can be confirmed by counting the number of points with differences above 1 m in
Figure 8A. Therefore, the scatter plots provide a first indication of suitable spatial criteria that should be restricted to the first 50 km.
Figure 8B confirms the large discrepancies for altimeter records more than 100 km apart. The plot also provides the first indication of good agreement between altimeter and buoy data in terms of Hs, with the black solid curve being very close to the dashed red curve. This result is not replicated in
Figure 8E for U10, where the differences between altimeter and buoy data are much larger than for Hs, highlighting the importance of altimeter wind calibration described in Ribal and Young [
23]. The spread of expected differences for U10 (
Figure 8D,E) is also very high, but it is reduced within the first 25 km, which can be observed by counting the number of points above 5 m/s in
Figure 8D.
A clearer representation of the average increase in mean differences within the first 50 km is presented in
Figure 8C,F. The curve for Hs crosses the 0.20 m value (a level associated with buoy accuracy according to the NDBC and linked to high-quality simulations using WAVEWATCH III) at approximately 37 km. It reaches a mean difference of 0.21 m at the 50 km distance, which is small in terms of mean difference, but at 20 km, it is only 0.17 m. The latter represents a sampling variability with low RMS difference that has the potential to benefit the validation and analysis of highly accurate products. The same curve for U10 (
Figure 8F) shows much lower values for mean differences associated with spatial displacement, despite the large spread and occasional large differences encountered in
Figure 8D. This means that the arithmetic mean can successfully filter out those less frequent large discrepancies, once again demonstrating the benefits of space–time averages for altimeter collocation instead of selecting the single closest altimeter record.
The current results show lower differences as a function of distance than those reported by [
28]. Monaldo [
28] found differences of 0.5 m at 100 km and 0.2 m at 20 km, while
Figure 8 shows the 0.2 m level being crossed at 37 km. It should be noted that Monaldo’s [
28] observations were based on GEOSAT altimeter data, while
Figure 8 was calculated using 3133 records from JASON3. Hwang et al. [
67] found that when spatial lags are less than 10 km, the differences in Hs are approximately 0.1 m, which is more similar to what has been reported so far, around 0.15 m. It is worth remembering that these differences are among individual satellite records or direct comparisons against buoy data and do not necessarily represent the result found after collocation, which involves computing the average of all altimeter records inside a circle to yield a single value (mean) per transect. In other words, the spatial separation criterion defines a circle of diameter in which satellite data are selected, so an altimeter transect that passes directly over the buoy will have a transect length of 100 km when the traditional 50 km criterion is utilized. More distant passes will define shorter chords of the circle. Each transect within the circle will only define one collocated value of Hs and U10, reducing the data size.
As described earlier, the analysis in this section only considers altimeter transects in which at least one record is very close to the buoy, within 10 km.
Figure 9 shows the number of JASON3 altimeter records selected for different spatial criteria, ranging from 10 to 200 km. Considering that the distance of consecutive JASON3 measurements is 5.87 km, it is expected that a 10 km radius (20 km diameter) will select only two or three altimeter records. Moving to a 25 km radius increases the average number of records to eight, which is above the minimum number of five points discussed in Ribal and Young [
23], and eight is the default number of neighbors used in python pyresample kd_tree. The commonly used 50 km criterion selects around 15 records, while 100 km and 200 km criteria select, on average, 30 and 60 records, respectively. Therefore, the large RMS differences at larger distances must be balanced with the number of points to be averaged in order to avoid using an overly restrictive criterion, such as 10 km, which would provide an insufficient number of records for a proper final estimate.
A new comparison and assessment must be re-run at this point, using transect averaging results for different criteria as a sensitivity analysis of the spatial criterion.
Figure 10 presents the scatter plots comparing the effect of different spatial criterion on the collocated satellite data. It is interesting to note the lower scattering, with points falling close to the main diagonal, when compared to
Figure 6 and
Figure 7, associated with non-averaged measures. This once again emphasizes the benefits of the averaging process. The results for 10 km, 25 km, and 50 km are very similar, with an increasing spread associated with 100 km and 200 km radii. The upper points representing the most severe intensities start to diverge to the main diagonal only in the 100 km and 200 km plots. Therefore,
Figure 10 proves the stability and robustness of the collocation using the spatial mean, especially for radii of 25 and 50 km. The problem of extremely large scattering of U10 (
Figure 6) has been solved by using spatial averaging, making the methodology even more relevant when wind speeds are included. In this section, the estimates calculated using the spatial criterion of 10 km were used as a reference. The next section will select the buoy data for that purpose.
Statistical metrics (Equations (2)–(5)) were calculated to further investigate the influence of the spatial criteria on the final estimates of Hs and U10.
Figure 11 presents the scatter and systematic differences together. It is possible to see a very close agreement between results using the spatial criteria of 10 to 50 km, followed by a progressive divergence for 100 km, and magnified discrepancies for 200 km—in both error metrics. Within 50 km, the scatter differences remain below 10% and the systematic differences below 1% for both U10 and Hs.
Table 3 provides statistical results for four metrics (Equations (2)–(5)) that further complement
Figure 11. The bias is very low for both Hs and U10, even at greater radii. Thus, the main impact of increasing the spatial averaging radius
is on the increase in scatter errors (with a consequent increase in the RMSD) and decrease in the correlation coefficient. The scatter differences are above 10% for
equal to and above 50 km for Hs and 200 km for U10.
Figure 10 and
Figure 11 and
Table 3 show the significant effect of spatial averaging on the collocation of Hs and U10 when compared to
Figure 8, which contains the original altimeter records. Using the altimeter tracks as shown in
Figure 8 may lead to occasional very discrepant values and strong deterioration when considering further distances. However, when the spatial mean is applied, it smooths out the discrepant values, providing more stable estimates with low scatter differences and better results at larger radii. Even so, the results from
Figure 10 and
Figure 11 and
Table 3 still indicate that the upper limit of 50 km is a suitable spatial criterion for altimeter collocation. However, the results so far are related to altimeter–altimeter comparison and not direct validation against buoy measurements, which is essential to consider and is performed in the next section.
3.3. Spatial Averaging Method and Altimeter Validation
In this section, the altimeter dataset is expanded from JASON3 only to JASON3, JASON2, CRYOSAT2, JASON1, HY2, SARAL, and SENTINEL3A. These satellite missions have high accuracies and demonstrate close agreement in the cross-validations performed by Ribal and Young [
23], providing a vast dataset of reliable information from altimeters. Additionally, the AODN-calibrated variables of Hs and U10, namely Hs
c and U10
c, were also included. The temporal criterion
= 1800 s was first applied, and two spatial criteria of τ = 25 and 50 km were tested. The methodology was the same as the previous section, but the comparisons were now performed against buoy data. Apart from the arithmetic ensemble mean, two other averaging methods were included: (i) the inverse distance weighting, using a simple linear function (named LIDW); and (ii) the same inverse distance weighting, but a Gaussian function was applied instead of a linear decay (named GF). The inverse distance weighting method has a very intuitive assumption that the closer a point is to the center position being estimated, the more influence or weight it has in the averaging process. The calculation was performed using the python package pyresample.kd_tree.
Table 4 and
Table 5 present the final assessment, where the poorest results are highlighted in red, and the best results are highlighted in green. Initially, it is clear that using a single nearest altimeter record to the buoy measurement does not provide optimal estimates compared to using the spatio-temporal averaging of altimeter records. The worst performances are confirmed through high impacts on RMSE, SI, and CC. This characteristic is more evident for Hs than U10. Next, the comparison between the three averaging methods shows very similar results, with minor differences in the third decimal of the error metrics. It is inconclusive at this stage. Thus, the simple arithmetic mean, which is widely used, remains a good option, at least for the dataset and validation considered in this study.
The most notable impact, leading to a great improvement in the error metrics, is found in the calibrated variables, Hs
c and U10
c. The AODN calibration [
23] resulted in a reduction in Bias, RMSE, and SI, combined with an increase in CC. This effect is more pronounced for wind speed, where the RMSE of U10 of 2.02 m/s (τ = 50 km, from arithmetic mean) was reduced to 1.68 m/s for U10
c. For the same comparison, the CC improved from 0.79 to 0.89. However, the bias shifted from −0.49 m/s (underestimation of altimeter winds) to 0.76 m/s (overestimation of altimeter winds), which warrants further investigation.
The best performance in
Table 4 and
Table 5 is observed in Hs
c, where the calibration succeeded in improving all four metrics, and the results exhibit almost no bias (around 1 cm only). The RMSE of Hs
c is 0.21 m, the scatter errors are at 9%, and the correlation coefficient is 0.98.
Figure 12 presents the results for Hs
c and confirms the excellent performance of collocated altimeter data shown in
Table 4. The Hs
c quantiles closely follow the main diagonal of QQ-plots, and the scatter plots also show the points not far from the diagonal of perfect agreement, ranging from small values to the highest ones above the 99th percentile.