1. Introduction
The development of strategies to reduce the impact of coastal erosion and flooding must be informed by quantitative estimates of the wave height and period that a site is likely to experience. Where long records of observations are available, the methods of extreme value analysis [
1] are often used to estimate the wave height that has a
probability of exceedance in a year, and these values are used in the design of coastal structures. A succinct summary of the history of the approach in engineering applications is provided in [
2]. Unfortunately, even the longest data records seldom exceed twenty years, so the wave
exceedance height estimates are generally based on an extrapolation of a statistical model chosen to fit the observed frequency of much more likely events. The results are, therefore, sensitive to the model selected. Observations of wind velocity have been recorded at many more locations, and for longer time periods, so many projects develop a design wind condition and then use an empirical wind-driven wave formula (e.g., US Army Corp of Engineers) [
3,
4] to develop coastal design guidance. More recently, physics-based mathematical models of the mechanisms of wave growth, propagation, and decay have been used to synthesize long-time series of wave statistics at locations where design parameters are required. A few approaches use records of observed winds to force models that compute the wave field evolution, and others have used atmospheric models or parametric representations of the path and character of hurricanes [
5,
6,
7]. For instance, it is useful to employ numerical wave models, such as the phase-solving Boussinesq-type model [
8,
9], to predict coastal wave conditions by simulating real physical processes.
Extreme value analysis of significant wave heights has been extensively studied in offshore and coastal safety [
10,
11]. However, this analysis may be limited by shorter record durations and gaps due to instrument failures.
This paper describes and assesses a novel statistical approach to extend the data record length of wave observations that exploits the availability of long records of wind observations at coastal sites and shorter near-shore buoy-based wave observations, and the space-time correlation of wind and wave parameters. The method allows the synthesis of longer records of coastal wave height for use in extreme value estimation.
Wave data is distributed to the public through two main portals, the National Oceanic and Atmospheric Administration (NOAA) and the United States Army Corps of Engineers (USACE) databases. Buoy fleets are independently owned and operated by universities, private research institutes, and government agencies.
Our analysis relies on NOAA datasets as distributed through NOAA’s National Data Buoy Center (NDBC). We consider hourly data from the NOAA buoy
in the Central Long Island Sound (see
Figure 1) on these variables: significant wave height (referred to as wave height in the rest of the paper), wind direction, and wind speed. We also employ measurements of the wind speed and direction made at Sikorsky Memorial Airport, Bridgeport, CT (USAF station ID 725040, available at
ftp://ftp.ncdc.noaa.gov/pub/data/noaa, (accessed on 1 October 2022). The goal of this analysis is to
- (a)
verify the similarity (association) between the wind data from the buoy and its proximal weather station (see
Section 2),
- (b)
using hourly observations from a training data set, build a predictive time series regression model adjusted for nonlinear effects for buoy wave heights as a function of its history and functions of lagged wind variables from the buoy (as exogenous predictors) (see
Section 3),
- (c)
build and train a model for buoy wave heights by replacing the buoy wind data in (b) with the Sikorsky station wind data after transforming it (see
Section 2.2),
- (d)
compare the in-sample and out-of-sample predictive accuracy from the model in (c) to the model in (b), in order to corroborate the use of transformed wind data from the Sikorsky coastal station (see
Section 3.5.1 and
Section 3.5.2)
- (e)
compute ensemble hindcasts of buoy wave heights for the past several decades based on the model we train in (c) using transformed Sikorsky station wind data from 2005–2013 (see
Section 4), and
- (f)
conduct extreme value analysis and estimate the return values for wave heights using (i) the ensemble hindcasted wave heights for 1974–2004 obtained from (e), and (ii) the observed buoy wave heights from 2005–2013 (see
Section 5).
The flowchart in
Figure 2 summarizes our methodology.
2. Data Description
Hourly oceanographic data from buoy is available for the years 2004–2013 on these variables: wave heights (H in meters, m), wind speed (u in m/s), and wind direction (W, 0 to 360 degrees in increments of 10). Since the sensors do not record wind speeds lower than 0.25 m/s, such low speeds are recorded at 0.25 m/s (censoring). We use the wind direction at each sample time to estimate the fetch, the distance from the wave observation location to the nearest land in the up-wind direction. Since wind direction is reported in 10-degree increments, 36 fetch lengths are required.
Hourly wind speed and wind direction data are also available since 1974 at Sikorsky Memorial Airport, Bridgeport CT, which is approximately 45 km to the west of the buoy. A simple empirical analysis of data from 2005 to 2013 reveals the relationship between observations from the buoy and from the coastal station. A Pearson correlation coefficient of in wind speed values, and a Kendall rank correlation coefficient of in the wind direction values indicate a positive correlation between the wind patterns from the buoy and coastal station. The Sikorsky station consistently records lower wind speeds than the buoy since the boundary shear stress over land is larger.
Data for each year is grouped into a
windy season which includes the months of November, December, January–March, and a
calm season which includes April–October. For example, the 2007–2008 windy season consists of data from November 2007–December 2007 and January–March 2008, while the 2008 calm season includes data from April 2008–October 2008. The data must be preprocessed before building and training statistical models. The buoy and coastal station data have data gaps, which are addressed in
Section 2.1.
Section 2.2 describes how the coastal station wind data is matched to wind data from the buoy.
2.1. Missing Data Imputation
Missing data occurs when the buoy or the coastal station sensor fails to record any information at a scheduled report time, usually due to hardware failure. The run-length of missing observations ranges from 1 to 2000 data points for the buoy, and from 1 to 235 data points for the Sikorsky station.
As shown below, missing data on wind speed and direction are first imputed, and then used for wave height imputation. In each year, the imputation is made separately for the windy and calm seasons.
2.1.1. Wind Speed and Wind Direction Imputation
The steps below describe imputation of wind speed and direction from the buoy as well as the coastal station. The imputation strategy depends on the run-length of the missing observations, denoted by .
We impute the wind speed and wind direction separately via linear interpolation using the
imputeTS package in R with the built-in
function [
12]. Since the values of wind direction range from 0 to 360 with an increment of 10, the interpolated value is rounded off to the nearest tenth digit, i.e.,
For wind speed, we first linearly interpolate a missing value using the R package, and include randomness in the imputation by adding a factor
, where
and sd
denotes the standard deviation of non-missing wind speed values. To be consistent with observed wind speeds from the sensors, we censor imputed values smaller than 0.25 m/s and record them as 0.25 m/s.
Here, we jointly impute missing values in wind speed and wind direction in order to preserve the relationship between. To do this, we replace each missing pair of wind speed and wind direction with a pair sampled randomly (with replacement) from the observed data.
2.1.2. Wave Heights
Since wave heights are strongly affected by wind speed, information on wind speeds are used to impute missing wave heights. Wind speed values are grouped into four ordered bins based on their quartiles. The missing wave heights corresponding to wind speeds in each bin is imputed by the median of observed wave heights in that bin, and injected with some randomness by the term , where , and c is a scale factor (damping factor set to avoid over-volatility) computed from the residual standard error obtained by fitting a linear model on the non-imputed data during the initial empirical data analysis. The value of c is calculated as for calm months, and for windy months.
2.1.3. Summary of Imputed Data
The pre-processed data consists of 8760 hourly observations for a non-leap year, where 3624 of them belong to a windy season and 5136 belong to a calm season. For a leap year, there are 8784 hourly observations, with 3648 coming from a windy season and 5136 from a calm season. The data for each hour consists of either observed or imputed wave heights, wind speed, wind direction, and fetch.
2.2. Transforming Coastal Station Wind Data
Our exploratory data analysis shows a positive association between the wind speed and direction from the buoy and the coastal station, although the latter consistently records lower wind speeds. To adjust for this level difference, we transform the Sikorsky station wind speed data to match the wind speed data from the buoy in 2005–2013, using the steps below. Let , , and respectively denote the buoy wind speed, Sikorsky station wind speed, and wind direction.
Step 1. Using information from the storm event database [
13], we define a binary variable
to indicate the occurrence of an extreme event along the Long Island Sound within a 12-h period prior to time
t. That is,
Step 2. We divide the data into three groups based on the wind direction recorded at the Sikorsky station: (i) East to West, with a corresponding range of ( with 3813 observations), (ii) West to East with ( with 10,262 observations), and (iii) All other directions ( with 18,580 observations).
Step 3. Within each group
, we fit a linear regression model with
as response and
and
as predictors, i.e.,
. The estimates of the coefficients from the three groups are shown in
Table 1. We use these to estimate the transformed Sikorsky station wind speed as the fitted values from the regression:
3. Threshold Regression GARCH Model for Wave Heights
We build a statistical model for wave heights in a given season (windy, or calm) in any given year. In a windy season, the wind speeds and wave heights are generally higher than in a calm season. To reflect this, we build separate models for each season. Starting from a linear regression motivated by an approximation of Goda’s simplified Sverdrup-Munk-Bretschneider (SMB) model [
14], we build a rich model that incorporates lagged linear and nonlinear relations between wave heights and wind.
3.1. Goda’s Simplified SMB Model
Ocean wave dynamics are closely linked to wind behavior. Let
H denote wave height (m),
u be the wind speed (m/s),
F be the fetch length (m) for a given wind direction, and
g be the gravitational acceleration (9.807 m/s
). Goda’s consolidated method is described by [
4]
The equation in (
1) is highly non-linear. We construct a simplifying linear approximation as follows:
where,
,
, and
. Applying the MacLaurin series expansion
, we get
Now, we can explore an approximate linear relationship between H and functions of wind speed u and fetch F.
3.2. Threshold Regression Model for Wave Heights
We use the approximation in (
3) as a starting point to construct a suitable regression model for wave heights as a function of lagged wave heights, lagged exogenous predictors, and their interactions with lagged wave heights, as well as an additional threshold effect to accommodate non-linearity. Let
, and
denote the hourly observations on wave height, wind speed, and fetch respectively. The model includes different components that are discussed below.
Correlation analysis and wave physics suggest that the square of lagged wind speed,
, should be included as a distinct, stand-alone term in a regression model for wave height. Additionally, we consider three functions of wind speed and fetch that emerge from the SMB approximation in (
3), i.e.,
Empirical evidence based on cross-correlation function (CCF) plots between the exogenous predictors and wave heights shows that lags are useful for modeling. We include as predictors and , where .
Based on empirical evidence from the ACF plots of wave heights, we include lagged wave heights for
, i.e.,
as predictors (
6). We also include two-way interactions between the lagged exogenous predictors and lagged wave heights, i.e., interactions of
with
and
where
and
.
Wave height behavior may be considerably different at low or high wind speeds. In order to capture the behavior of wave heights at low wind speed of
m/s, we include an indicator (segmenting) variable as a predictor:
We also conjecture that the behavior of
may be different when
, where
e is an unknown threshold parameter. Since a threshold parameter can capture a nonlinear relationship between a response and a predictor, we include the thresholding effect of
by including as predictor,
Incorporating all the above effects, we write the general form of the threshold regression model for
as
where
, and
K are set to six based on the empirical analysis.
3.3. Model Fitting Using Buoy Wind and Wave Heights Data
We have a large basket of predictors, including lagged wave heights, lagged exogenous predictors, and their interactions, as well as the segmenting and thresholding predictors. To avoid any issues due to multicollinearity, we first fit the linear portion of the model (ignoring the last two terms on the right side of (
6)), and retain predictors whose variance inflation factors (VIFs) do not exceed 30. A VIF of 30 corresponds to a coefficient of determination
of
in a linear regression of the predictor in question on all other predictors and assesses its collinearity with them. While a cutoff VIF value of 20 (
) or 10 (
) have also been suggested in the literature, by using a VIF threshold of 30, our model incorporates all predictors derived from the SMB approximation while accounting for multicollinearity. We then fit the model in (
6) with the retained predictors, and the segmenting and thresholding effects. We use the R package
chngpt which employs an exact maximum likelihood estimation approach [
15]. That is, we choose a grid of candidate change points that uniformly span the empirical distribution of the quantiles of the predictor we threshold (here
), and estimate the change point
. We illustrate our model fitting for the windy and calm seasons in 2007–2008. The code and results for other years are available in the github link,
https://github.com/NamithaVionaPais (accessed on 5 January 2023).
We show results for data corresponding to the windy season which includes November and December 2007, and January, February, and March 2008. The fitted threshold regression model for wave heights is
The threshold parameter e corresponding to is estimated to be .
The calm season includes data from April-October 2008, for which the fitted threshold regression model is
Here, the threshold parameter e is estimated as . The change point for in the windy season is higher (almost double) than the change point in the calm season aligns with physical theory since data on squared wind speed will lie on a larger scale for the windy season than the calm season. The threshold regression model has a good in-sample fit and is consistent with the physical theory expressed in the approximation of the SMB equation.
3.4. Garch Model to Handle Residual Nonlinearity and Volatility
Residual and squared residual diagnostics from the fitted threshold regression model helps us to assess whether we have adequately captured all linear and nonlinear associations between the response and predictors. Let
denote the residuals from a fitted model (
6). Diagnostics based on the autocorrelation function (ACF) and partial autocorrelation function (PACF) of
[
16] confirm that linear temporal relationships are adequately explained. However, the ACF and PACF plots of the
squared residuals
indicate that some nonlinear dependence remains and has not been adequately explained by the threshold model.
The class of generalized autoregressive conditionally heteroscedastic (GARCH) models [
17] is useful for fitting nonlinear time series which exhibit conditional heteroscedasticity. GARCH models belong to a class of univariate time series models that enable us to model volatility (conditional standard deviation) and study non-linear dependence over time. These models are often used in conjunction with linear regression and linear time series model to capture temporal dependence of different types.
We incorporate the nonlinear dependence into the model for
by fitting a suitable GARCH model to the residuals
, and then adding these fits to the fits from the threshold regression model (
6). After a thorough investigation of different error distributions and model orders, we select a GARCH
model for fitting the residuals from any season in any year:
where
is the conditional variance of
given the history,
are i.i.d.
, and
and
are unknown model parameters which are estimated using the R package
fGarch using the method of conditional maximum likelihood [
18].
The estimated parameters from the GARCH(1,2) fit to the residuals for the 2007–2008 windy season and 2008 calm season are shown in
Table 2. Let
denote the fits from the GARCH model in (
9); the sign function is given by
We use the fits from (
10) to obtain the final estimates for wave heights as described in
Section 3.5.
3.5. Final Fitted Threshold Regression GARCH Model
The
final model for wave heights consists of fitting (
6) followed by (
9) and obtaining parameter estimates and in-sample fits. We present the results of
in feet.
3.5.1. In-Sample Fits from the Final Model
We fit the data to the 2007–2008 windy season and 2008 calm season, and assess the fits for the same seasons. Let
be the fits from the threshold regression model in (
6). The fitted wave heights from the final threshold regression GARCH model are given by
Figure 3 and
Figure 4 respectively show the in-sample fits along with the observed wave heights for the 2007–2008 windy season and 2008 calm season. We observe that the in-sample fits have a remarkably close match with the observed wave heights for all months in both seasons. The root mean square error (RMSE) based on the in-sample model fits is
for the 2007–2008 windy season and
for the 2008 calm season (see the row corresponding to the Year 2008 in
Table 3).
To verify the robustness of the final threshold regression GARCH model, we fit the model to data from windy and calm seasons in each of the years from 2005 to 2013 using wave height and wind data from the buoy. The RMSE values based on in-sample fits for each season are shown in
Table 3. The low values of RMSE indicate that our model is able to accurately predict the wave heights for each year.
Another useful check is to use the transformed coastal wind data (see
Section 2.2) to fit the buoy wave heights for windy and calm seasons in 2005–2013. That is, we fit (
6) and (
9) and obtain
using Sikorsky coastal station wind data as predictors. The RMSE values based on the in-sample fits, using data on exogenous predictors from Sikorsky coastal station for windy and calm seasons in each of the years from 2005 to 2013 are shown in
Table 3. The small RMSE values indicate that our proposed modeling approach can adequately predict wave heights, even when we use the wind data from a nearby coastal station rather than from the buoy.
3.5.2. Ensemble Out-of-Sample Hindcasting Fits from the Final Model
Our main goal is to use our final model to predict wave heights for years when they are not observed (i.e., prior to 2005), by using wind data from the Sikorsky station as predictors. We refer to such prediction as back forecasting, or hindcasting.
Before we do this, it is essential to verify the out-of-sample predictive accuracy from our threshold regression GARCH model that uses the coastal wind data as predictors. It is also important to provide a framework for constructing ensemble hindcasts. We assess out-of-sample predictions of wave heights for the 2007–2008 windy season and the 2008 calm season, assuming that we do not observe these wave heights. To do this, we build the threshold regression GARCH model using data (i.e., wave heights from the buoy and wind data from the coastal station) from the 2008–2009 windy season until the 2013 calm season.
We describe the steps to get out-of-sample predictions of wave heights for the 2007–2008 windy season and the 2008 calm season. We refer to this as year .
Figure 5 and
Figure 6 respectively show the ensemble out-of-sample wave height predictions (in red) for the 2007–2008 windy season and 2008 calm season using the model trained on the years 2009–2013. At most time points, these ensemble hindcasts are close to the observed wave heights (in black). Even at times when they are lower than high observed wave heights, the latter fall within the 10-sd prediction interval.
For the 2007–2008 windy season and the 2008 calm season,
Table 4 shows the RMSE values by comparing the out-of-sample predictions
with the observed values
for each training year
y. We also compute the RMSE values based on the ensemble hindcast
(
15). The reasonably small RMSE values provide convincing evidence that our hindcasting approach is useful.
4. Hindcasting Several Decades of Wave Heights
Leveraging results from
Section 3, we hindcast
unobserved wave heights prior to 2005 using transformed wind data from the Sikorsky station as predictors. Specifically, we obtain ensemble hindcasts of wave heights and the 10-standard deviation prediction interval estimates, using the approach in
Section 3.5.2. Here
since we use data from 2005–2013 to train the model.
We examine the validity of these hindcasts using boxplots of mean and maximum wave heights for each month; see
Figure 7. The figure on the left shows boxplots for each month based on the mean wave heights for that month between 1974 and 2004. For example, the boxplot for January is constructed from the mean values for January from 31 years. The figure on the right shows similar boxplots for each month based on the maximum wave heights. In addition, we show as red dots the observed mean wave heights (
Figure 7 (left)) and the observed maximum wave heights (
Figure 7 (right)) for each month for the years 2005–2013. These plots show that the mean and maximum wave heights across each month over the years 1974–2004 are relatively consistent with the observed mean and maximum heights for the years 2005–2013.
5. Extreme Value Analysis of Wave Heights
An
m-year return value of wave heights denotes a value exceeded on average once every
m years, and can be used to design safety control measures and appropriate coastal structures [
20]. The daily recorded maxima (over a 24-h period) are usually used to conduct extreme value analysis and estimate the return values using an approach such as peaks over threshold (POT); see [
1,
21].
While long time series of wave heights will allow us to obtain accurate return value estimates, these are rarely observed in practice for several decades. To estimate the return values, we use the point hindcasts of the wave heights and the prediction intervals ()) from 1973 to 2004, along with the observed wave heights from 2005 to 2013.
The POT approach consists of fitting a generalized Pareto (GP) distribution to the tail of the data consisting of values that exceed a given threshold
u and then estimating the return values based on the rate of occurrence of the exceedances over the threshold. The cumulative distribution function (c.d.f.) of the GP distribution is given by
where
,
and
(the real line).
We analyze the 1974–2013 wave heights data using the POT approach. We use the R package
POT [
22] to fit the distribution in (
17) to the daily maximum of wave heights with the threshold
u set to 5 ft. The package employs an exact maximum likelihood (ML) approach to estimate the return values. We use the observed wave heights from 2005–2013; for the years 1974–2004, we use three setups, i.e., (i) hindcasted wave heights
, (ii) the lower bounds
, and the (iii) upper bound
as defined in
Section 3.5.2.
The maximum likelihood estimates of the GP scale and shape parameters for each of the three setups are shown in
Table 5 (see rows 2, 3 and 4). In addition, we also conduct extreme value analysis on the observed daily maximum wave heights data
(2005–2013) and the maximum likelihood estimates are shown in
Table 5 (see row 1).
The return value plots from the POT estimation for each setup are shown in
Figure 8. We use the return level plots to estimate the
m-year
return value defined as the value that is expected to be equaled or exceeded on average once every
m years (with a probability of
) for
. The
-yr return value
is calculated as
The return values estimates for each setup are shown in
Table 6. We observe that the return level estimates based on the observed wave heights for the years 2005–2013 is considerably lower than the estimates obtained using observed and hindcasted wave heights (specifically the hindcast estimates
and upper limit of prediction interval
) for the years 1974–2013. Therefore, the results from our analysis are more reliable when designing offshore and coastal systems. These estimates will allow the practitioners to design the ship, offshore, and coastal structures by taking into consideration the most extreme wave conditions they might need to withstand during their lifetime.
It is useful to explore the possible impact of major weather cycles such as Southern Oscillation Index (SOI) and North Atlantic Oscillation (NOA) on the analysis and findings. To do this, we can first group the years corresponding to SOI (El-Niño) and SOI (La-Niña), or similarly, NOA and NOA , and then compare the annual average wave height exceedances over a given threshold (say, 4, 5, or 6 ft) between the groups. Extreme value analysis can then be implemented separately for each of these groups to look for evidence of significant differences. Since the meteorology of our site (Long Island Sound) is typical of the northwest Atlantic, the extreme value analysis may not be sensitive to the weather cycles, but other areas may have substantial decadal-scale cycles that should be considered in the empirical hindcasting of wave conditions.
6. Discussion and Summary
This study presents the m-yr return value estimates of wave heights () helpful in designing an offshore and coastal structure to achieve safety control. This knowledge is often difficult to infer due to the unavailability of sufficient data. Therefore, we develop a suitable predictive model that uses wind data from proximal coastal station (Sikorsky) to predict wave heights near the buoy. As an initial data preprocessing, we set up a suitable imputation technique to obtain hourly wave and wind data. A Threshold Regression GARCH Model for wave heights near the buoy is built for two different seasons, windy and calm, for each year using wind data near the buoy. Next, we investigate the prediction efficacy of this model (in-sample and out-of-sample) when the wind data from the buoy is replaced by the transformed wind data from the Sikorsky station. Once we establish the validity of our model, we use the available wind data from the Sikorsky station for a significant past of over 30 years to hindcast wave heights near the buoy.
By treating these hindcasts as estimates of the unobserved wave heights past, we conduct extreme value analysis, using the POT approach on the daily maximum of the wave heights to estimate the m-yr return values. Since these estimates are based on a longer historical record, they can be used to design better coastal protection structures. Our study aims to improve coastal flood risk assessments by synthesizing long records of wave data based on existing wind data and thoroughly investigating wind-wave behavior’s temporal dependence.
An alternative useful future investigation is to model the hourly wave heights from 2005 to 2013 as a single long time series, and indicate differential effects of windy and calm seasons through dummy variables treated as additional predictors. While this approach would alleviate the need to distinguish between leap and non-leap years, it would require the inclusion of multiple thresholds corresponding to windy and calm seasons. There may be value to examining this issue further, but we think it would require a substantial additional effort and a modification of the approach.
Author Contributions
Conceptualization, N.V.P., N.R. and J.O.; methodology, N.V.P., N.R. and J.O.; formal analysis, N.V.P., N.R. and J.O.; investigation, N.V.P., N.R., E.S. and J.O.; resources, J.O.; data curation, N.V.P. and E.S.; writing—original draft preparation, N.V.P., N.R., E.S. and J.O.; writing—review and editing, N.V.P., N.R. and J.O.; visualization, N.V.P., N.R. and J.O.; supervision, N.R. and J.O.; project administration, N.R. and J.O.; funding acquisition, N.R. and J.O. All authors have read and agreed to the published version of the manuscript.
Funding
Funding for this project was provided by the Connecticut Institute for Resilience and Climate Adaptation (CIRCA) through their climate research seed grants program. In addition, O’Donnell was supported by the United States Department of Housing and Urban Development through the Community Block Grant National Disaster Recovery Program, as administered by the State of Connecticut, Department of Housing.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Conflicts of Interest
The authors declare no conflict of interest.
References
- Coles, S. An Introduction to Statistical Modeling of Extreme Values, 1st ed.; Springer: Berlin/Heidelberg, Germany, 2001. [Google Scholar]
- Mathiesen, M.; Goda, Y.; Hawkes, P.J.; Mansard, E.; Martín, M.J.; Peltier, E.; Thompson, E.F.; Van Vledder, G. Recommended practice for extreme wave analysis. J. Hydraul. Res. 1994, 32, 803–814. [Google Scholar] [CrossRef]
- US Army Corp of Engineers. Shore Protection Manual; Vol 1 P-652; CERC Department of the Army, U.S. Army Corps of Engineers: Washington, DC, USA, 1984.
- Goda, Y. Revisiting Wilson’s formulas for Simplified Wind-Wave Prediction. J. Waterw. Port Coast. Ocean Eng. 2003, 129, 93–95. [Google Scholar] [CrossRef]
- Panchang, V.; Jeong, C.K.; Demirbilek, Z. Analyses of Extreme Wave Heights in the Gulf of Mexico for Offshore Engineering Applications. J. Offshore Mech. Arct. Eng. 2013, 135, 031104. [Google Scholar] [CrossRef]
- US Army Corp of Engineers. North Atlantic Coast Comprehensive Study: Resilient Adaptation to Increasing Risk; Technical Report P-116; U.S. Army Corps of Engineers: Washington, DC, USA, 2015.
- Liu, C.; Onat, Y.; Jia, Y.; O’Donnell, J. Modeling nearshore dynamics of extreme storms in complex environments of Connecticut. Coast. Eng. 2021, 168, 103950. [Google Scholar] [CrossRef]
- Gao, J.; Ma, X.; Dong, G.; Chen, H.; Liu, Q.; Zang, J. Investigation on the effects of Bragg reflection on harbor oscillations. Coast. Eng. 2021, 170, 103977. [Google Scholar] [CrossRef]
- Gao, J.; Zhou, X.; Zhou, L.; Zang, J.; Chen, H. Numerical investigation on effects of fringing reefs on low-frequency oscillations within a harbor. Ocean Eng. 2019, 172, 86–95. [Google Scholar] [CrossRef]
- Liu, C.; Jia, Y.; Onat, Y.; Cifuentes-Lorenzen, A.; Ilia, A.; McCardell, G.; Fake, T.; O’Donnell, J. Estimating the annual exceedance probability of water levels and wave heights from high resolution coupled wave-circulation models in long island sound. J. Mar. Sci. Eng. 2020, 8, 475. [Google Scholar] [CrossRef]
- Nadal-Caraballo, N.C.; Melby, J.A. North Atlantic Coast Comprehensive Study Phase I: Statistical Analysis of Historical Extreme Water Levels with Sea Level Change; Technical Report; Engineer Research and Development Center Vicksburg MS Coastal and Hydraulics LAB: Vicksburg, MS, USA, 2014. [Google Scholar]
- Moritz, S.; Bartz-Beielstein, T. imputeTS: Time series missing value imputation in R. R J. 2017, 9, 207. [Google Scholar] [CrossRef]
- NCDC. NOAA Storm Events Database. 2023. Available online: https://www.ncdc.noaa.gov/stormevents/ (accessed on 1 October 2022).
- Sverdrup, H.U.; Munk, W.H. Wind, Sea and Swell: Theory of Relations for Forecasting; Hydrographic Office: Taunton, UK, 1947.
- Fong, Y.; Huang, Y.; Gilbert, P.B.; Permar, S.R. chngpt: Threshold regression model estimation and inference. BMC Bioinform. 2017, 18, 454. [Google Scholar] [CrossRef] [PubMed]
- Shumway, R.H.; Stoffer, D.S. Time Series Analysis and Its Applications; Springer: Berlin/Heidelberg, Germany, 2000; Volume 3. [Google Scholar]
- Bollerslev, T. Generalized autoregressive conditional heteroskedasticity. J. Econom. 1986, 31, 307–327. [Google Scholar] [CrossRef]
- Wuertz, D.; Runit, S.; Chalabi, M.Y. Package ‘fGarch’; Technical Report, Working Paper/Manual, 09.11.2009; R Core Team: Vienna, Austria, 2013. [Google Scholar]
- Ravishanker, N.; Chi, Z.; Dey, D.K. A First Course in Linear Model Theory; CRC Press: Boca Raton, FL, USA, 2021. [Google Scholar]
- Caires, S.; Sterl, A. 100-Year Return Value Estimates for Ocean Wind Speed and Significant Wave Height from the ERA-40 Data. J. Clim. 2005, 18, 1032–1048. [Google Scholar] [CrossRef]
- Caires, S. Extreme Value Analysis: Wave Data. JCOMM Technical Report No. 57. In Technical Report; World Meteorological Organization: Geneva, Switzerland, 2011. [Google Scholar]
- Ribatet, M.; Dutang, C. POT: Generalized Pareto Distribution and Peaks Over Threshold; R Package Version 1.1-10; R Core Team: Vienna, Austria, 2022. [Google Scholar]
Figure 1.
NOAA buoy located in the Central Long Island Sound and owned by University of Connecticut, Department of Marine Sciences. This buoy records meteorological data as well as wave height and period.
Figure 1.
NOAA buoy located in the Central Long Island Sound and owned by University of Connecticut, Department of Marine Sciences. This buoy records meteorological data as well as wave height and period.
Figure 2.
Summary of the methodology.
Figure 2.
Summary of the methodology.
Figure 3.
Observed and in-sample fits of wave heights using the threshold regression GARCH model for the 2007–2008 windy season.
Figure 3.
Observed and in-sample fits of wave heights using the threshold regression GARCH model for the 2007–2008 windy season.
Figure 4.
Observed and in-sample fits of wave heights using the threshold regression GARCH model for the 2008 calm season.
Figure 4.
Observed and in-sample fits of wave heights using the threshold regression GARCH model for the 2008 calm season.
Figure 5.
Ensemble hindcasts for the 2007–2008 windy season trained on years 2009–2013.
Figure 5.
Ensemble hindcasts for the 2007–2008 windy season trained on years 2009–2013.
Figure 6.
Ensemble hindcasts for the 2008 calm season trained on years 2009–2013.
Figure 6.
Ensemble hindcasts for the 2008 calm season trained on years 2009–2013.
Figure 7.
Boxplots of the mean (left), and maximum (right) hindcasted wave heights for each month for the years 1973–2004. The red dots indicate the mean (left), and the maximum (right) of the observed wave heights for each month for the years 2005 to 2013.
Figure 7.
Boxplots of the mean (left), and maximum (right) hindcasted wave heights for each month for the years 1973–2004. The red dots indicate the mean (left), and the maximum (right) of the observed wave heights for each month for the years 2005 to 2013.
Figure 8.
Return Level plots based on (i) for the years 2005–2013, (ii) with for the years 1974–2013 (iii) with for the years 1974–2013 (iv) with for the years 1974–2013. The estimates for the 10-year, 50-year, and 100-year return levels, along with their confidence intervals, are shows in the top left of the figures.
Figure 8.
Return Level plots based on (i) for the years 2005–2013, (ii) with for the years 1974–2013 (iii) with for the years 1974–2013 (iv) with for the years 1974–2013. The estimates for the 10-year, 50-year, and 100-year return levels, along with their confidence intervals, are shows in the top left of the figures.
Table 1.
Estimated regression coefficients along with the standard errors for the transformed Sikorsky station wind speeds . Group denotes wind direction: East to West (), West to East (), and All other directions ().
Table 1.
Estimated regression coefficients along with the standard errors for the transformed Sikorsky station wind speeds . Group denotes wind direction: East to West (), West to East (), and All other directions ().
| Windy Season | Calm Season |
---|
Group | | | | | | |
---|
1 | 1.3503 | 0.8429 | 0.8338 | 1.2087 | 0.8209 | 0.3530 |
(0.1910) | (0.8429) | (0.1674) | (0.1696) | (0.0123) | (0.1607) |
2 | 4.7836 | 0.6602 | −0.6969 | 3.9107 | 0.409 | −0.5313 |
(0.2274) | (0.0124) | (0.2162) | (0.1706) | (0.0116) | (0.1612) |
3 | 3.6884 | 0.6253 | 0.0878 | 1.8282 | 0.5289 | 0.8880 |
(0.1306) | (0.0091) | (0.1227) | (0.0935) | (0.0077) | (0.0884) |
Table 2.
GARCH model estimates along with their standard errors for 2007–2008 windy and 2008 calm season.
Table 2.
GARCH model estimates along with their standard errors for 2007–2008 windy and 2008 calm season.
Season | | | | |
---|
Windy | | | | |
Calm | | | | |
Table 3.
RMSE for in-sample fits using the threshold regression GARCH model on the buoy wave and wind data for each year 2005–2013 using model estimates for the same year.
Table 3.
RMSE for in-sample fits using the threshold regression GARCH model on the buoy wave and wind data for each year 2005–2013 using model estimates for the same year.
| Windy Season | Calm Season |
---|
Year | Buoy-Buoy | Buoy-Sikorsky | Buoy-Buoy | Buoy-Sikorsky |
2005 | 0.2957 | 0.3419 | 0.2330 | 0.2450 |
2006 | 0.3209 | 0.3284 | 0.1119 | 0.1535 |
2007 | 0.2886 | 0.4067 | 0.1772 | 0.2043 |
2008 | 0.2556 | 0.2865 | 0.1849 | 0.2052 |
2009 | 0.2507 | 0.2874 | 0.1894 | 0.2471 |
2010 | 0.2358 | 0.2915 | 0.2120 | 0.2422 |
2011 | 0.2065 | 0.2420 | 0.2268 | 0.2435 |
2012 | 0.2603 | 0.2758 | 0.2319 | 0.2575 |
2013 | 0.2857 | 0.3422 | 0.1909 | 0.2171 |
Table 4.
RMSE for out-of-sample fits for the 2007–2008 windy season and the 2008 calm season using the threshold regression GARCH model trained on data from the years 2009–2013.
Table 4.
RMSE for out-of-sample fits for the 2007–2008 windy season and the 2008 calm season using the threshold regression GARCH model trained on data from the years 2009–2013.
Year | 2007–2008 Windy Season RMSE | 2008 Calm Season RMSE |
---|
2009 | 0.9065 | 0.7084 |
2010 | 0.9944 | 0.7111 |
2011 | 0.8980 | 0.7220 |
2012 | 0.9167 | 0.7675 |
2013 | 0.9516 | 0.7093 |
Ensemble | 0.8554 | 0.6527 |
Table 5.
POT model estimates along with the standard errors based on observed wave heights from 2005–2013 and hindcasted wave heights from 1974–2004.
Table 5.
POT model estimates along with the standard errors based on observed wave heights from 2005–2013 and hindcasted wave heights from 1974–2004.
Setup | | |
---|
(2005–2013) | | |
| | |
(1974–2013) | | |
| | |
(1974–2013) | | |
| | |
(1974–2013) | | |
| | |
Table 6.
m-year return value estimates.
Table 6.
m-year return value estimates.
Setup | | | |
---|
(2005–2013) | | | |
(1974–2013) | | | |
(1974–2013) | | | |
(1974–2013) | | | |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).