1. Introduction
Climate change and sea level rise play a major role in the risk of flooding in coastal areas [
1]. Coastal flooding and erosion can result from several factors such as storm surges, tsunamis, subsidence, and high rainfall events, as discussed in [
2,
3]. Rising mean air temperature and sea level will increase the expected frequency of these events [
4], and this has prompted the need for the development of designs for interventions to reduce impacts. There is wide geographic variability in the relative importance of these flooding mechanisms. Many coastal communities are located in bays or separated from the ocean by islands and are, therefore, sheltered from the direct effects of long period and high amplitude ocean waves. However, wind-driven storm surges freely propagate into bays to cause coastal flooding. Many other towns have been established near the mouths of rivers where they are vulnerable to flooding caused by high precipitation or snow melt in the watershed. The construction of seawalls or levees is often considered as a flood defence measure to address all three of these threats. However, a significant disadvantage of the approach is that the management of storm water during high precipitation rates is made more difficult. Retention basins and pumps are often employed and the system design must be informed by the joint probability distribution of high rainfall rates and high water level/wave heights.
Extreme value analysis ([
5]) is usually employed to develop design criteria for hazard mitigation projects. Univariate extreme value analysis has been extensively studied in offshore and coastal safety by investigating the behavior of coastal wave heights [
6,
7], wind speed [
8], and precipitation [
9,
10]. Insurance companies are also interested in assessing the risk from weather extremes, in order to design strategies to cope with, and adapt to, increased risks and expected damages [
11]. The most popular univariate extreme value methods include Peaks Over Threshold (POT) by the Generalized Pareto Distribution (GPD), as well as the Block Maxima approach [
5]. While univariate extreme value methods are a useful tool to investigate extreme behavior of individual events, bivariate extreme value methods should be employed when the co-occurrence of extremes in two processes must be considered.
Modeling the joint behavior of two or more events provides valuable insights into the likelihood and severity of extreme events, leading to more effective risk management strategies. For instance, the authors of [
12] discuss modeling wave extremes by capturing the dependence among storm intensity, directionality, and intra-time distribution, offering improved boundary conditions for wave and near-shore analyses. In particular, traditional models for bivariate extremes are based on limiting joint distributions for the extreme values in each margin of a bivariate sample. Bivariate extreme value theory has been well developed and includes analogous extensions of the POT and Block Maxima approaches [
5]. In addition, copulas [
13] are also used as a popular choice to model the bivariate extremes. Copulas have extensive applications in storm modeling [
14,
15,
16], hydrology [
17,
18,
19], and coastal engineering [
20,
21]. For example, the dependence between observed water levels and precipitation, including impacts of sampling methods and distribution fitting and the resulting flood values, is explored using copulas in [
22]. The joint distribution of rainfall and storm surge based on the copula function is investigated in [
23]. The effect of internal climate variability on copula-based compound event analysis is studied in a case study in the Netherlands by [
24]. Ref. [
25] provides a review of commonly used techniques for estimating the tail dependence of a joint distribution. Ref. [
26] discusses a tail dependence matrix, where a multivariate dependence measure is constructed using this bivariate tail dependence structure.
In this paper, we employ various bivariate extreme value methods to estimate the dependence between sea level and precipitation based on their joint probability of exceedances. By modeling the joint behavior of two variables, we can evaluate the dependence between the two variables in their extremes. We discuss and summarize the results from various bivariate extreme value analysis methods with respect to the bivariate sea level and precipitation data from Bridgeport, CT. Our goal in this study is to (i) explore various bivariate extreme value methods to estimate the dependence structure on the bivariate data, and (ii) compare the dependence structure between sea level and precipitation in the presence and absence of tidal harmonics. Of course, at some sites, river flow and water level, or wave conditions, may have to be considered, so in the discussion section we comment on how the methods we have considered may be extended to more than two variables.
The format of this paper follows.
Section 2 describes the bivariate daily maximum sea level and precipitation data from Bridgeport, CT.
Section 3 discusses different bivariate extreme value methods for modeling the joint behavior of two variables and discusses the dependence structure. In
Section 4, we discuss the dependence structure between sea level and precipitation after adjusting for sea level harmonics.
2. Data Description and Exploration
We analyze data with regard to two variables, sea level and precipitation, to investigate the dependence in their extremes. Observations on sea level have been recorded at hourly intervals from approximately 200 water level gauges across various locations in the United States by the National Oceanic and Atmospheric Administration (NOAA) and predecessor agencies. The data are shared through an interface at
https://coastwatch.pfeg.noaa.gov/erddap/tabledap/ accessed on 17 May 2023. The hourly data are measured with respect to the North American Vertical Datum of 1988 (NAVD 88), which is the official vertical datum of the United States used as a reference system by surveyors, engineers, and mapping professionals to measure and relate elevations to the Earth’s surface [
27] and is recorded in meters. Therefore, the sea level data has negative values, corresponding to the observations below the reference point.
Data on precipitation were obtained from the NOAA National Climatic Data Center, which archives 24 h precipitation totals for numerous stations in the United States recorded in inches. Ref. [
28] describes the database known as the Global Historical Climatology Network (GHCN)-Daily dataset, which contains daily data from over 80,000 surface stations worldwide, about two-thirds of which are for precipitation alone.
We analyzed sea level and precipitation data from Bridgeport, CT for the years 1970–2015. The sea level data are from the Bridgeport harbor, while the 24 h precipitation totals are from the Sikorsky Memorial Airport in Bridgeport. The time series plot of the two variables are shown in
Figure 1. In the precipitation data, no precipitation was reported on
of the days. The missing values in the daily maximum sea level data and the precipitation are handled using linear interpolation using the R package (R version 4.3.2)
imputeTS.
We consider exploratory investigations on the bivariate data using trend analysis in
Section 2.1. Next, we examine the tail behavior of each variable independently by conducting univariate extreme value analysis in
Section 2.2.
Section 2.3 describes the empirical joint behavior of sea level and precipitation.
2.1. Exploring Temporal Trend and Correlations
Although
Figure 1 does not seem to indicate the presence of time trend, in general one can use various methods for verifying the presence of time correlation in the data summarized in
Table 1. The Mann–Kendall test and Sen’s slope test for a monotonic trend make minimal assumptions. Both the methods include an asymptotic z-test, where the null hypothesis is that the data do not follow a monotonic trend. The Mann–Kendall test [
29] is a non-parametric, rank-based approach that determines the presence of a monotonic trend in time series data by estimating Kendall’s
. The Sen’s Slope [
30] provides a nonparametric alternative to ordinary least squares regression, calculating the median of all possible slopes between pairs of data points of a given time series. As expected for precipitation, both the Mann–Kendall test and Sen’s slope fails to reject the null hypothesis (with a
p-value of 0.5 for both methods), indicating an absence of a monotonic trend in the precipitation data. For the daily maximum sea level data, the the Mann–Kendall test and Sen’s slope rejects the null hypothesis (with
p-values < 0.05), indicating a presence of a monotonic trend. However, a Kendall’s
of
and a Sen’s slope of 0.000009 indicate a presence of a slightly positive trend with a small non-negative slope. We also investigated the presence of a time trend by conducting linear regression analysis, with day, month, and year as predictors. The regression analysis on precipitation shows no effect of time trend with an adjusted R-squared of
, while the regression analysis on daily maximum sea level shows a slight effect of a time trend with an adjusted R-squared of
.
We investigated the temporal correlation in the data using the auto-correlation function (ACF).
Figure 2 shows the ACF plots for precipitation, squared precipitation, daily-maximum sea level, and and squared daily-maximum sea level. The ACF precipitation and squared precipitation do not show any pattern indicating an absence of temporal correlation. The ACF of daily-maximum sea level and squared daily-maximum sea level show a presence of temporal correlation in the data. The Hurst exponent [
31] is also estimated to measure long-range dependence in the time series, quantifying the series’ tendency to persist in a certain direction. The Hurst exponent
H lies between 0 and 1, where
indicates anti-persistent behavior,
corresponds to random walk, and
corresponds to persistent behavior). The value of the empirical Hurst exponent of 0.5805 indicates no persistent behavior in the precipitation data, while an empirical Hurst exponent of 0.7281 indicates the presence of moderate persistence in the daily maximum sea level data.
2.2. Univariate Extreme Value Analysis of Sea Level and Precipitation
We investigated the univariate extreme behavior by analyzing the daily maximum sea level and precipitation from 1970–2015 using the peaks over threshold (POT) by generalized Pareto distribution (GPD) approach [
5,
32]. The POT approach models observations that exceed a certain high threshold, say
w. This consists of fitting a generalized Pareto (GP) distribution to the tail of the data that exceed a threshold
w with a cumulative distribution function (c.d.f).
where
,
, and
(the real line). We implemented the POT method using the
extrememix [
33] R package, which employs a Bayesian framework to model each univariate data in order to estimate the GPD model parameters and threshold
w. The maximum likelihood estimates of the GPD scale (
), shape (
), and threshold (
) for the sea level and precipitation data are shown in
Table 2.
Given the estimated GPD parameters, the return value, which is the value exceeded on average once every
m years (return period), is computed as
where
.
Figure 3 shows the return level plots for the sea level and precipitation data, with return values
plotted across different values of return period
m.
The return values for
on daily maximum sea level and precipitation are shown in
Table 3. We can leverage the
m-year return value estimates of the daily maximum sea level (presented in
Table 3) to examine precipitation behavior at time points where daily maximum sea level exceeds
. For instance, the first plot in
Figure 4 shows the distribution of precipitation on days when the daily maximum sea level exceeds the 25-year return value. Similarly, the second plot in
Figure 4 shows the distribution of daily maximum sea level on days when precipitation exceeds
. In
Figure 4, we see that the days when the daily maximum sea level (or precipitation) exceeds the 10-year return value does correspond to higher values in the precipitation (or daily maximum sea level) data. Following the univariate exploration, we examine the empirical joint behavior of the two variables, described in
Section 2.3.
2.3. Empirical Joint Behavior of Sea Level and Precipitation
It is important to study the joint behavior of daily maximum sea level and precipitations and investigate their dependence in order to characterize the likelihood of flooding resulting from both precipitation and coastal storm surge. This information can help design strategies for flood risk reduction due to sea level fluctuations and precipitation rates. We investigated the empirical joint dependence between the sea level and precipitation using Spearman’s rank correlation coefficient, a scatter plot, and a cross-correlation (CCF) plot. Spearman’s rank correlation coefficient [
34] is a non-parametric measure of rank correlation that assesses the statistical dependence between the rankings of two variables and the degree to which the relationship between two variables is monotonic. Spearman’s rank correlation coefficient is estimated to be 0.1981 between precipitation and water level, indicating a weak positive monotonic relationship. The scatter plot shown in
Figure 5 between the two variables does not show any linear trend. The cross-correlation plot shown in
Figure 6 indicates the strength of the linear relationship between daily maximum sea level and precipitation at different lags.
Figure 6 shows that the two variables have a maximum correlation at lag 0 with a correlation value of
. Thus, based on the empirical plots, the overall linear association between the two variables is weak.
In addition to the overall association, it is important to examine the co-occurrence of high precipitation at times of anomalously high water level in order to determine if the coastal project design needs to account for the potential consequences of extremely unlikely events due to the joint occurrence of high rainfall and high sea level.
Figure 7 shows the empirical joint probability distribution function denoted by
for the daily maximum sea level and precipitation data. The joint probability of high values of precipitation and rainfall is close to zero.
The dependence structure between two variables in the tail region can be evaluated using bivariate extreme value methods. A brief overview of various bivariate extreme value methods in the literature and the dependence measure obtained on the bivariate sea level and precipitation data based on each of the methods is described in
Section 3.
4. Dependence between Sea Level and Precipitation after Adjusting for Sea Level Harmonics
Tidal oscillations in ocean water level arise from the gravitational attraction between the earth, moon, and sun, and the centrifugal acceleration due to the rotation of earth around the center of mass of the earth–moon–sun system. The periods of the earth’s rotation and orbit around the sun, as well as the moon’s orbit, are reflected in the oscillations of the sea surface. Other more subtle effects, like the the oscillation of the axis of rotation of the earth, and the ellipticity of the orbits of the earth and moon add to the number of tidal frequencies, or tidal harmonics [
44,
45,
46]. In shallow coastal areas, tidal oscillations are further complicated by nonlinear dynamic interactions that generate additional harmonic and sub-harmonic frequencies. More than a hundred harmonics can be detected, though most have a very small amplitude. The amplitude and phase of each frequency varies spatially, largely due to local bathymetric and coastal geometry effects, but once estimated from observations, accurate predictions can be made. By identifying and removing the effect of tidal harmonics in water level observations, one can further examine the dependence structure between the storm-forced sea level variations and precipitation.
To obtain the daily maximum sea level after removing the effect of harmonics, we investigated the behavior of the hourly sea level data to check for the presence of harmonics. We conducted a harmonic analysis on the hourly sea level data using the UTide Matlab (ver. R2021b) toolbox [
47]. The unified tidal analysis can handle record times that are irregularly distributed and suitable for multi-year analyses. Once the harmonic analysis is conducted, we considered the daily maximum over the residuals from the harmonic analysis along with the precipitation as the input data for the bivariate extreme value analysis. The results from the bivariate precipitation and daily maximum sea level after removing the effect of harmonics is presented in
Table 6. We observed a stronger dependence structure between the bivariate sea level and precipitation data when the harmonic effects from the sea level data have been removed.
Since we have a long time series available (daily data from 1970–2016), we also assessed the temporal evolution of dependence between the precipitation and daily maximum sea level extremes with and without removing harmonics. To do this, we used the bivariate threshold excess model (outlined in
Section 3.1) on a series of data subsets. Each subset represents bivariate data encompassed within a 10-year window. This moving window analysis is considered with a step size of five years.
Figure 10 shows the temporal evolution of dependence, clearly showing an overall stronger dependence when we remove harmonic effects from the sea level data. The period from 1995 to 2005 shows the strongest dependence, with a value of 0.7757.
5. Discussion and Summary
This study presents the results of analyzing the dependence structure between sea level and precipitation extremes using bivariate data from Bridgeport, CT. We explored various bivariate extreme value methods, including the bivariate threshold excess model, the maxima approach, L-moments, and copulas. Our analysis shows no evidence that the occurrence of extreme values of high sea level and 24 h precipitation are correlated in the observational record at Bridgeport, CT, a station with long data records representative of coastal Connecticut. The largest surges occur in Southern New England when winds are from the east or northeast [
48] due to the passage of extratropical cyclones in the colder months (November–April), or tropical cyclones in late summer. However, high precipitation rates are associated with winds from the south [
49]. As both types of cyclones propagate across Southern New England, the winds from the northeast generally follow the conditions when precipitation is likely, and this may explain why the correlation is low. We further investigated the dependence structure after adjusting the effect of harmonics on the hourly sea level data to remove the periodic influences of tidal processes. These repeating patterns, while natural and significant, can obscure the underlying trends and anomalies crucial for understanding long-term sea level changes and their implications. We observed that the dependence structure between the daily maximum sea level and precipitation demonstrates a stronger dependence after adjusting for the effect of harmonics.
It is critical to note that, though the methods demonstrated here are applicable to other sites, the fact that the occurrence of extreme rain rates and coastal water levels is uncorrelated is a site-specific result. Since the character of precipitation statistics varies regionally, and tides and storm surges are sensitive to the geometry and bathymetry of the regional coastline, it seems likely that the results may apply across Southern New England. However, additional data analysis is necessary to assess that. In other parts of the world, the extremes may be much more correlated. The methods we used could also be applied to examine the relationship between extreme wave height and surge level. These may show high correlations at some sites.
The IPCC 2021 [
50] concluded that it is “virtually certain” that global mean sea level will rise throughout the 21st century. They also report that for the eastern United States, there is high confidence in the predictions of an increase in the occurrence of high precipitation, and medium confidence that the wind speed during storms will also increase. However, the rate of change in extremes that we should anticipate is unclear. It is straightforward to assess the impact of a change in the mean sea level based on our results, and regional estimates of that are available. An analysis of the effect of global warming on the correlation between extreme winds and precipitation at the scales at which project information is required still needs to be conducted.
Future work could focus on investigating and applying extreme value methods to three or more variables. Extensions of the peaks-over-threshold approach [
51], L-comoments [
41], and the copula approach [
52,
53] in a multivariate framework have been developed. One could also investigate the dependence measure in a spatial framework by statistically modeling spatial extremes using max-stable processes [
54], spatial copula [
55], or Bayesian hierarchical models [
56].