Next Article in Journal
Assessment of Soil Loss from Land Cover Changes in the Nan River Basin, Thailand
Previous Article in Journal
Near Real-Time Detection and Moment Tensor Inversion of the 11 May 2022, Dharchula Earthquake
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Marginal Distribution Fitting Method for Modelling Flood Extremes on a River Network

1
Coastal and Hydraulics Laboratory, Engineer Research and Development Center, U.S. Army Corps of Engineers, Vicksburg, MS 39180, USA
2
Risk Management Center, Institute for Water Resources, U.S. Army Corps of Engineers, Lakewood, CO 80228, USA
3
School of Mathematical and Statistical Sciences, Clemson University, Clemson, SC 29634, USA
*
Author to whom correspondence should be addressed.
GeoHazards 2023, 4(4), 526-553; https://doi.org/10.3390/geohazards4040030
Submission received: 27 October 2023 / Revised: 7 December 2023 / Accepted: 14 December 2023 / Published: 16 December 2023

Abstract

:
This study utilized a max-stable process (MSP) model with a dependence structure defined via a non-Euclidean distance metric, with the goal of modelling extreme flood data on a river network. The dataset was composed of mean daily discharge observations from 22 United States Geological Survey streamflow gaging stations for river basins in Missouri and Arkansas. The analysis included the application of the elastic-net penalty to automatically build spatially varying trend surfaces to model the marginal distributions. The dependence model accounted for the river distance between hydrologically connected gaging sites and the hydrologic distance, defined as the Euclidean distance between the centers of site’s associated drainage areas, for all stations. Modelling the marginal distributions and spatial dependence among the extremes are two key components for spatially modelling extremes. Among the 16 covariates evaluated for marginal fitting, 7 were selected to spatially model the generalized extreme value (GEV) location parameter (for each gaging station’s contributing drainage basin, its outlet elevation, centroid x coordinate, centroid elevation, area, average basin width, elevation range, and median land surface slope). The three covariates selected for the GEV scale parameter included the area, average basin width, and median land surface slope. The GEV shape parameter was assumed to be constant throughout the entire study area. Comparisons of estimates obtained from the spatial covariate model with their corresponding “at-site” estimates resulted in computed values of 0.95, 0.95, 0.94 and 0.85, 0.84, 0.90 for the coefficient of determination, Nash–Sutcliffe efficiency, and Kling–Gupta efficiency for the GEV location and scale parameters, respectively. Brown–Resnick MSP models were fit to independent multivariate events extracted from a set of common discharge data, transformed to unit Fréchet margins while considering different permutations of the non-Euclidean dependence model. Each of the fitted model’s log-likelihood values indicated improved fits when using hydrologic distance rather than Euclidean distance. They also demonstrated that accounting for flow-connected dependence and anisotropy further improved model fit. In this study, the results from both parts were illustrative; however, further research with larger datasets and more heterogeneous systems is recommended.

1. Introduction

Estimating the annual exceedance probability (AEP) for extreme floods is an important problem in hydrology for dam and levee safety. In risk assessments, the probability of failure for dams and levees often depends upon the magnitude of the hydrologic loading [1]. Hence, determining credible estimates of the AEPs of extreme floods that could lead to failure is necessary. The design flood AEP for most dams and levees is 1 × 10−2 or less frequent. In the United States (U.S.), high hazard dams are designed to pass the Probable Maximum Flood (PMF), which typically has an AEP of 1 × 10−4 or less frequent. In the U.S., most projects have limited flood data. The length of the observed discharge record at many sites is less than 100 years. As a result, the greatest source of error in AEP and quantile estimates for an at-site flood frequency analysis is often a limited observed discharge record. Epistemic uncertainties in the estimated AEPs for extreme floods can potentially be reduced by incorporating as much hydrologic information into the frequency analysis as reasonably possible [2,3,4,5].
Recent U.S. Army Corps of Engineers (USACE) applied research and development directed at the spatial analysis of hydrometeorological extremes has involved the development of pointwise and areal estimates of extreme precipitation [6,7] and extreme snow water equivalent (SWE) [8]. These studies have applied max-stable processes (MSPs), the stochastic process analogue of the multivariate extreme value distribution [9,10,11,12,13]. With their application, one can not only compute pointwise return level maps, but also model the joint distribution and more complex areal-based assessments of risk while working within the theoretically justified mathematical framework provided by extreme value theory (EVT). Areal precipitation frequency estimates derived from the application of an MSP do not require one to develop and apply empirical depth-area reduction factors to convert point to areal estimates [14]. The MSP based analyses performed by Skahill et al. [6,7,8] computed spatially varying pointwise estimates of extreme precipitation/SWE by leveraging gridded covariate data and employing recent advances in variable selection and model fitting [15,16,17].
With the application of MSPs, the extremal coefficient is a useful measure for summarizing the degree of spatial dependence among the extreme data [9,10,18]. Its values vary between one and two; a value of one indicates complete dependence, whereas a value of two corresponds to independence. It is also possible to estimate the extremal coefficient via the madogram, and these estimates are useful for MSP model checking [19]. For response variables such as precipitation, temperature, and wind speed, it is typical for the extremal coefficient to be modelled as a function of the Euclidean distance between any two locations [18,20,21].
Asadi et al. [22] introduced an MSP based model that leveraged a unique, non-Euclidean distance metric to model extreme flood data on river networks. Their approach utilized the river distance between hydrologically connected gaging sites, and the hydrologic distance, defined as the Euclidean distance between the centers of site’s associated drainage areas, for all stations. The hydrologic distance accounts for shared spatially variable meteorologic events and the geomorphometry of the river basin. Each of these two distance measures can potentially differ from the Euclidean distance. Several studies have proposed alternative methods to spatially model flood frequency; however, they have either assumed independence among the extreme data or accounted for spatial dependence in a manner that does not conform with EVT, hence potentially limiting their credibility for extrapolation [23,24,25,26,27,28,29].
The max-stable modelling approach proposed by Asadi et al. [22] holds promise for potential enhancements to USACE’s Bayesian estimation and fitting software BestFit version 1.0 [30]. At present, BestFit combines limited at-site flood data with temporal data on historic and paleofloods, spatial data from areal precipitation frequency estimates, and causal data from the application of a calibrated hydrologic model forced with rainfall frequency events. The MSP methodology they introduced could potentially yield a clearer delineation between spatial information expansion and causal information expansion data for future BestFit applications such that they better align with the original outline of the flood frequency hydrology concept [3,4,5]. For these reasons, we took this modelling approach in this study.
The primary contribution of this study was in demonstrating that the trend fitting methodology introduced by Love et al. [31] and applied by Skahill et al. [6,7,8] is also useful for modelling extreme flood data on river networks. With its application, this study was able to automatically evaluate a larger set of potential marginal modelling covariates than were originally considered by Asadi et al. [22] in a manual manner. We also show that their treatment of extremal dependence, that involves a combination of river distance and hydrologic distance rather than Euclidean distance, resulted in a model of the extreme flood data that closely resembled what is commonly observed with dependence model summaries for MSP applications with extreme precipitation and SWE data [6,7,8].

2. Materials and Methods

2.1. Study Area

The 12,989-square-kilometer (km2) study area consists of the drainage area upstream of the streamflow gaging stations with identifying (ID) numbers 10, 14, 18, 21, and 22, as depicted in Figure 1a. It includes parts of the Current, Little Black, Eleven Point, Spring, and Strawberry River basins and contains 22 United States Geological Survey (USGS) streamflow gaging stations (Figure 1a). Each of these five rivers are sub-basins of the Black River (Figure 1). The Black River is entirely contained within the Salem Plateau Subdivision of the Ozark Plateau Physiographic Region, which is characterized as gently rolling topography with an abundance of karst features such as springs, sinkholes, and caves [32]. Greer Spring on the Eleven Point River, Mammoth Spring on the Spring River, and Big Spring on the Current River are the three largest springs in the Ozark Plateaus whose geographic extent is depicted in Figure 1b [32].
The study area does not contain any major reservoirs, and the land area is predominantly forest and pastureland [32]. For the Current River Basin, approximately 80.1, 16, and 0.1% of its area is classified as forest, grassland, and urban land, respectively. The Little Black River Basin is classified to be approximately 54.9, 26.1, 17.7, and 0.1% forest, grassland, cropland, and urban land, respectively [33]. The Eleven Point River Basin is classified to be 65, 34, and 0.4% forest, grassland/cropland, and urban land, respectively [34]. Forest, grassland/cropland, and urban land classifications cover approximately 48.3, 49.1, and 2.4% of the Spring River Basin, respectively. The Strawberry River Basin is reported to be approximately 66 and 31.9% forest and pastureland/cropland, respectively.
Thirty-meter resolution raster datasets representative of 2001 and 2021 from the National Land Cover Database (https://www.mrlc.gov/data, accessed on 1 December 2023) were used to quantify land use changes from 2001 to 2021 for each of the five systems. Developed land increased by 0.3% within the Current River Basin and 0.2% for the remaining four basins. Forested/planted-cultivated land decreased by 0.3%/0.4%, 1.0%/0.3%, 1.6%/0.6%, 3.0%/0.8%, and 2.6%/0.8%, while shrubland/herbaceous land increased by 0.2%/0.1%, 1.4%/−0.4%, 0.8%/1.1%, 1.2%/2.4%, and 1.4%/1.8% within the Current, Little Black, Eleven Point, Spring, and Strawberry River Basins, respectively.
Adamski et al. [32] summarized the climate of the Ozark Plateaus (Figure 1b). It is characterized as temperate with its thunderstorm dominated severe weather season primarily occurring during the months from March to June. Wilkerson [33] reported the months from April to June to be the wettest for the Current and Little Black River Basins. Miller and Wilkerson [34] reported that March through May were the wettest months for the Eleven Point River Basin. Figure 2 summarizes the monthly mean precipitation climatology for each of the five basins, computed using the gridded Parameter-elevation Relationships on Independent Slopes Model (PRISM) monthly climate dataset representative of the period 1981–2010 [35]. The graphs in Figure 2 depict a trimodal distribution for the monthly mean precipitation climatology across all five basins with two larger modes occurring in May and November and a smaller mode in July. The months of April, May, and November were consistently the three wettest individual months across all five basins. Except for the Little Black River Basin, January, February, and August were consistently the three driest months. For the Little Black River Basin, January, February, June, and August were the four driest months.
Adamski et al. [32] indicated a general southeast directed increase for mean annual precipitation from minimum values in the north of the Ozark Plateaus to maximum values near its southern boundary. This trend is generally observed in Figure 3a, which depicts the mean annual precipitation computed using the PRISM monthly climate dataset. Figure 3b depicts the PRISM gridded mean annual precipitation dataset for a region surrounding the study area’s five river basins. The PRISM data-computed mean annual precipitation values for the Current, Little Black, Eleven Point, Spring, and Strawberry River basins were 1189, 1217, 1195, 1199, and 1218 millimeters (mm), respectively. This large degree of homogeneity in the mean annual precipitation climatology across the five systems is observed in Figure 3.
Adamski et al. [32] summarized mean monthly temperatures in the Ozark Plateaus to range from 30 to 38 degrees Fahrenheit (°F) during January, generally its coolest month, and from 78 to 82 °F in July, typically the warmest month. The mean monthly mean and minimum temperature values presented in Figure 4 for the Current, Little Black, Eleven Point, Spring, and Strawberry River basins, computed using the gridded PRISM dataset, support their mean temperature climatology summary [32] that January and July are the coolest and warmest months, with computed values of 0.4, 1.0, 1.0, 1.6, 2.3 and 25.2, 26.0, 25.5, 25.9, and 26.4 degrees Celsius (°C), respectively. The mean monthly mean temperature values presented in Figure 4 are above 0 °C across all months for all five watershed systems. However, the PRISM mean monthly minimum temperature values presented in Figure 4 are below 0 °C for all five river basins during the months of January, February, and December. Across the Current, Little Black, Eleven Point, Spring, and Strawberry River basins, March was consistently the fourth coolest month, with mean March minimum temperature values of 1.0, 1.9, 2.2, 1.6, and 2.7 °C, respectively.
Figure 5 and Figure 6 and Table 1 present the spatial distribution and summary statistics of elevations and basin slopes throughout the Current, Little Black, Eleven Point, Spring, and Strawberry River Basins. The scale of the raster digital elevation model data presented in Figure 5 is one arc second, and its source is the U.S. Geological Survey 3D elevation program. The basin slopes presented in Figure 6 were computed using the digital elevation model data shown in Figure 5. For the Current, Little Black, Eleven Point, Spring, and Strawberry River Basins, the computed maximum basin reliefs were 389, 177, 374, 304, and 219 meters (m), respectively. The hypsometric curves shown in Figure 7a present a fair degree of similarity across the five watershed systems that is not easily apparent upon examination of Figure 5 and Table 1. By contrast, the plots presented in Figure 7b of basin specific land surface slopes reinforce the information presented in Figure 6 and Table 1.
Adamski et al. [32] summarized the streams of the Black River Basin as fast flowing with minimum and maximum monthly streamflows within the Ozark Plateaus generally occurring between July and October and between March and May, respectively. Wilkerson [33] reported flood frequencies from Alexander and Wilson [36] for several USGS streamflow gaging stations located within the Current River Basin, including for the stations with IDs 1, 3, 8, 9, and 10, as depicted in Figure 1a. They listed values of 27,300, 50,700, 68,700, 93,500, 113,000, and 185,000 cubic feet per second for 2-, 5-, 10-, 25-, 50-, and 100-year return periods for the station with ID number 10, as shown in Figure 1a. Miller and Wilkerson [34] also listed flood frequency values from Alexander and Wilson [36] for two USGS streamflow gaging stations located within the Eleven Point River Basin, viz., the stations with ID numbers 16 and 17 as depicted in Figure 1a. Southard and Veilleux [37] computed and reported flood frequency values for 14 of the 22 USGS streamflow gaging sites shown in Figure 1a, i.e., stations with ID numbers 1, 3, 6–8, 9, 10, and 15–21, as depicted in Figure 1a.

2.2. Discharge Data

Mean daily discharge data were collected from the USGS National Water Information System for each of the 22 USGS streamflow gaging stations whose locations are shown in Figure 1a. For each gaging station, Table 2 summarizes its assigned ID, USGS station number, location, upstream drainage area, period of record, and number of missing data values. Figure S1 includes the period of record plots of the mean daily discharge data for each station. Based on the precipitation and temperature climatology for the study area, only the daily streamflow data from April to November were used for analysis. Table 3 lists the begin date, end date, and number of missing April–November mean daily data values for each of the 22 streamflow gaging stations. Figure S2 includes plots of the April–November mean daily discharge data for each station. Calendar year annual maxima were computed for each of the 22 streamflow gaging stations using the April–November (seasonal) mean daily data. Figure S3 includes plots of the seasonal (April–November) annual maxima for each station. Table 3 lists the number of April–November (seasonal) annual maxima for each station. Figure S4 includes plots which were used to define thresholds and decluster the seasonal mean daily discharge data to extract independent storm events for application of Equation (1). Common measurements were required for the spatial dependence modelling. Overall, 9 of the 22 stations with common April–November (seasonal) mean daily discharge measurements for the period 2002–2020 were used for the dependence modelling. Their station IDs were 04–07, 09, 10, 13, 17, and 18.

2.3. Covariate Data

Asadi et al. [22] evaluated four covariates to model the marginal distributions throughout T : the latitude of the centroid, size, mean elevation, and mean slope for the contributing drainage area associated with each gaging station. In this study, 16 covariates were evaluated for marginal fitting (Table 4). In total, 14 of the 16 covariates listed in Table 4 were readily computed for each gaging station’s contributing drainage area using a geographic information system and raster digital elevation model dataset. The basin average length and average width were each computed using the estimates for the basin area and perimeter [38]. Precipitation climatology was not included as a covariate due to the large degree of homogeneity that was observed throughout the study area for the mean annual precipitation (Figure 3). Estimated values of the 16 covariates for each gaging station’s contributing drainage area are provided in Table S1.

2.4. Methods

In this study, we applied the EVT-based MSP modelling approach introduced by Asadi et al. [22]. The modelling analysis involved two parts, one to model the spatially variable marginal distributions and another to properly account for the spatial dependence among the observed flood data [6,7,8,22]. Asadi et al. [22] comprehensively outlined their MSP based modelling approach, and herein we only highlight a few of its essential features. We encourage the interested reader to refer to their work for the full details [22]. The following section, which discusses fitting the marginal distributions, includes a description of aspects that were unique to this study.

2.4.1. Marginal Fitting

In univariate EVT, it can be shown that a distribution is max-stable if and only if it is the generalized extreme value (GEV) distribution [39]. Mathematical nondegenerate limit law expressions of max-stability exist in the multivariate and spatial process settings [10]. In either case, univariate EVT results guarantee that the marginal distributions of an MSP are max-stable GEV distributions, possibly with GEV model parameters that may vary spatially. Ribatet [10,11], Ribatet et al. [9], Davison et al. [12], and Cooley et al. [13] provided thorough summaries of MSPs and MSP based modelling.
Asadi et al. [22] presented a threshold exceedance Poisson point process independence likelihood for marginal fitting:
L ( ξ j , a j , n , b j , n ) e x p n j 1 + ξ j q j , p b j , n a j , n 1 / ξ j × i I j a j , n 1 1 + ξ j X j , i b j , n a j , n 1 / ξ j 1 ,
where ξ j , a j , n , b j , n , n j , and q j , p denote the GEV shape, scale, and location parameters at fixed streamflow gaging site locations on the river network, t j   j = 1 , , m , the number of years of observations at location t j , and the empirical p -quantile, p 1 , of the data X j , i ,   i = 1 , , n for location t j , respectively, and wherein I j = i 1 , , n : X j , i > q j , p . With n j , the parameters ξ j , a j , n , and b j , n equal those in the GEV distribution for annual maxima.
Trend surfaces were defined to support prediction throughout the entire river network, T . Trend surfaces spatially model the location, μ ( s ) , scale, σ ( s ) , and shape, ξ ( s ) , parameters of the known GEV marginal distributions as a function of location s. For example, linear trend surfaces are of the form μ s = η μ , 0 + η μ , 1 c o v μ , 1 + + η μ , n μ c o v μ , n μ , σ s = η σ , 0 + η σ , 1 c o v σ , 1 + + η σ , n σ c o v σ , n σ , ξ ( s ) = η ξ , 0 + η ξ , 1 c o v ξ , 1 + + η ξ , n ξ c o v ξ , n ξ , where η · , i and c o v · , i are the parameters and covariates of the linear trend surface for μ ( s ) , σ ( s ) , and ξ ( s ) , respectively. Factors that are assumed or known to influence extreme flood hydrology in a drainage basin, for example, climatological, morphometric, and physiographic data, were candidates to be included as covariates.
It is important to model the spatial variation of the marginal parameters by carefully “building relevant trend surfaces including any relevant covariable” [10]. Poor characterization of μ ( s ) , σ ( s ) , and ξ ( s ) complicates estimation of the dependence parameters [10,40]. In this study, linear trend surfaces for the known GEV marginal parameters were developed by leveraging the theory of spatial extremes [9,10] and recent advances for fitting general linear models [15,16,17,31].
The elastic net penalty [41] was applied to regression models to facilitate model selection from among the set of potential covariate models using the trend surface fitting methodology introduced by Love et al. [31]. The elastic-net penalty is a convex combination of the penalties of ridge [42,43] and lasso [44] regression, and the resulting estimates are able to retain properties of both approaches. Given observations y i ,   i = 1 ,   , n , an n × m matrix of covariates C O V , and an assumed linear model:
y i = η 0 + η 1 c o v i , 1 + + η m c o v i , m ,
the elastic-net minimizes:
1 2 n i = 1 n w ~ i y i η 0 η c o v i T 2 + λ j = 1 m 1 2 1 α η j 2 + α η j ,
where λ is non-negative and tuned to weight the penalty term; α 0 ,   1 controls the penalty term to vary from ridge to lasso regression at α = 0 and α = 1 , respectively; and w ~ i is the weight assigned to the ith observation [9]. Ridge regression results in solutions that include all the predictors, whereas application of lasso regression yields sparse, much more easily interpretable solutions [45]. The elastic-net penalty is a convex combination of these two penalties. As the parameter that weights the relative contributions of the L1 and L2 penalties increases from 0 to 1, the number of non-zero estimated coefficients increases from 0 to the sparsity of the lasso [15].
Automatic variable selection was a primary aim for the marginal fitting analysis; therefore, we weighted the L1 penalty more heavily so that the elastic-net performed much like lasso regression while retaining ridge regression’s capacity to collectively shrink the coefficients for any highly correlated covariables [15,46]. To select the tuning parameter, cross validation (CV) was employed with each elastic-net model fit. Each elastic-net model was fit using the R software (version 4.2.1) package ‘glmnet’ [15]. For each model, the pseudo responses were made up of the three GEV univariate parameter estimates at each location, and a set of spatially varying covariates were used as covariates in the models. Independent elastic net-model fits were performed for μ ( s ) and σ ( s ) and guided subsequent spatial GEV model fitting and selection. We note that we set ξ s = ξ , as in EVT; it is common to consider the GEV shape parameter in this manner [18,47], especially over homogeneous regions. Figure 8 is a schematic diagram depicting the main elements of the marginal fitting method.

2.4.2. Dependence Model

Asadi et al. [22] introduced an MSP dependence model for extreme flood data. Their modelling approach aimed to account for both the river distance between hydrologically connected gaging stations and for stations that are not connected but share common meteorological events. The former is simply the distance along the river, whereas the latter is termed the hydrologic distance and is defined to be the Euclidean distance between the weighted (e.g., using precipitation climatology or elevation) centroids of their upstream drainage areas. The overall distance metric that combines both river distance and hydrologic distance is defined as:
Γ s , t = λ R i v Γ R i v s , t + λ H y d r o Γ H y d r o s , t = λ R i v 1 π s , t 1 d s , t τ + + λ H y d r o R H s R H t 2 α
for any s , t T where λ R i v 0 , λ H y d r o 0 , π s , t , d s , t , τ > 0 , R = R β , c , H , and α 0 ,   2 represent a weight that is assigned to the dependence term for flow-connected gaging stations ( Γ R i v s , t ), a weight assigned to the dependence term for gaging sites that are not flow-connected ( Γ H y d r o s , t ), weights that account for the proportions of extreme flood discharge values coming from each branch of the river network, the river distance between sites s , t T , the distance beyond which inter-site correlation is essentially zero, a rotation and dilation matrix to account for geometric anisotropy, the hydrological location of a gaging location on the river network, and a variogram shape parameter, respectively. Understanding the desire to model any location in T , observed or not, Asadi et al. [22] suggested the use of elevation as a surrogate for precipitation and that values for π be estimated by integrating elevations for the area upstream of each gaging station. Similarly, they suggested the hydrologic location be defined as the center of mass of the precipitation climatology (or elevation, as a replacement for precipitation) for each gaging site’s contributing drainage area [22]. In order to account for potential anisotropy, the rotation and dilation matrix R is given by:
R = cos β sin β c sin β c cos β ,   β π 4 , 3 π 4 , c > 0 .
The parameters λ R i v , λ H y d r o , τ , β , c , and α were estimated via the fitting of the MSP. Large values of Γ s , t corresponded to weak dependence, whereas small values correspond to strong dependence.
The dependence measure Γ R i v s , t was constructed in the following manner. We defined Γ R i v s , t = 1 π s , t 1 d s , t τ + if s and t were flow-connected and Γ R i v s , t = 1 otherwise [22]. The weights π s , t reflect the number of bifurcations that occur in the river network between the two locations. We used the “linear with sill” covariance function given by 1 d s , t τ + [22]. For additional background on this and other covariances on river networks, refer to Ver Hoef et al. [48] and Ver Hoef and Peterson [49].

3. Results

3.1. Marginal Fitting

Using the independent storm events that were extracted from the seasonal mean daily discharge data, the Poisson point process likelihood of Equation (1) was applied to compute unique estimates for the marginal distribution’s GEV parameter values at each of the 22 streamflow gaging stations. Initial estimates for the application of Equation (1) were obtained from the results of GEV at-site block maxima analyses that used the seasonal maxima data shown in Figure S3. The results obtained from applying Equation (1) without covariates are listed in Table 5.
Figure 9 summarizes the application of two independent elastic-net regression models that were used to identify trend surface covariates for the GEV location and scale parameters. Each model used the data listed in Table 5 and the set of covariate values (standardized) listed in Table S1, and weights were assigned in accordance with the number of exceedances at each station.
Covariate coefficient estimates obtained from the elastic-net regression models were used as initial values for a second optimization of Equation (1). The GEV location and scale parameters were allowed to spatially vary as a function of their covariate values, and the GEV shape parameter was specified to be constant throughout the river network. Figure 10 plots comparisons of estimates obtained from the spatial covariate model with their corresponding “at-site” estimates (Table 5) for the GEV location and scale parameters at the 22 streamflow gaging stations. Figure 11 presents probability plots obtained from application of Equation (1), without and with covariates, respectively, for each of the study’s 22 daily streamflow gaging stations (Figure 1, Table 2).
Table 6 and Table 7 summarize the results from 500 CV supervised elastic net optimization runs that were performed, in each case, for the GEV location, scale, and shape parameter. For each GEV distribution parameter, 100 CV supervised elastic net optimization runs were performed while the number of folds were set equal to 3, 9, 11, 15, and 22. Leave-one-out CV equated with the case when the number of folds equaled 22. Given the marginal trend fitting method objective was feature selection, α (Equation (3)) remained fixed and close in value to one for each CV-directed elastic net run, while the number of folds was allowed to vary to examine the procedure’s sensitivity for covariate selection.

3.2. Modelling Spatial Dependence

The dependence modelling was limited to six streamflow gaging sites in the Current River Basin with a common period of record from 2002 to 2020. In particular, the analysis considered the six gaging stations with IDs 4–7, 9, and 10 (Figure 1, Table 2). With six sites, there were 15 potential pairs ( 6 2 ). Figure 12 is a plot of extremal coefficient estimates (estimated via the madogram [19]) as a function of Euclidean distance for those 15 possible pairs, with blue crosses for flow-connected pairs and black circles for flow-unconnected pairs.
Brown–Resnick MSP models were fit to independent multivariate events extracted from the common discharge data, transformed to unit Fréchet margins [22], for the six sites in the Current River Basin while considering different permutations of the non-Euclidean dependence model presented in Equation (4). The first model fit only considered the second component of Equation (4) ( λ R i v = 0 ), assumed isotropic data ( β = π 2 and c = 1 ), and specified each gaging site’s hydrologic location, H , to be its basin outlet rather than the center of mass of the precipitation climatology (i.e., H s = s ). Hence, only the dependence parameters λ H y d r o and α were specified as adjustable. The second fit was the same as the first but incorporated the matrix R to account for geometric anisotropy. The third and fourth fits were the same as the first and second, respectively, with the exception that each gaging site’s hydrologic location, H , was specified to be its basin centroid rather than outlet. A final fit combined both terms of Equation (4). The computed log-likelihood values associated with each model fit were 526.44, 530.09, 556.18, 574.32, and 582.75, respectively.
Using the best-fitting MSP model that applied the complete dependence model of Equation (4), Figure 13a plots model-based extremal coefficient estimates as a function of hydrologic distance.

4. Discussion

4.1. Marginal Fitting

The GEV parameter estimates listed in Table 5 are only applicable at their respective gaging site locations. The location and scale parameter estimates were subsequently used as the response variables in two independent elastic-net regression models to identify trend surface covariates for the GEV location and scale parameters. We then fit an MSP based model that used these elastic-net regression models to inform the makeup of trend surfaces for the location and scale parameters. This approach allowed us to estimate marginal distributions throughout the entire river network, T .
Selected covariates for the GEV location parameter included, for each gaging station’s contributing drainage basin, its outlet elevation, centroid x coordinate, centroid elevation, area, average basin width [38], elevation range, and median land surface slope (Figure 9a). Covariates selected for the GEV scale parameter included area, average basin width, and median land surface slope (Figure 9b). Because of the regions’ relative homogeneity, the GEV shape parameter was assumed to be constant throughout the entire study area.
The “at-site” estimates were the 66 GEV parameter values that were computed by optimizing Equation (1) without any covariates for the study’s 22 streamflow gaging station sites (Table 5). Computed values for the coefficient of determination (R2), Nash–Sutcliffe efficiency (NSE) [50], and Kling–Gupta efficiency (KGE) [51] equal to 0.95, 0.95, and 0.94, respectively, further summarized, in addition to the plot presented in Figure 10, a comparison of the estimates obtained from the spatial covariate model with their corresponding “at-site” estimates (Table 5) for the GEV location parameter at the 22 streamflow gaging stations. Similarly, computed values for the R2, NSE, and KGE equal to 0.85, 0.84, and 0.90, respectively, further summarized the agreement for the GEV scale parameter. For the spatial covariate model, the estimated value for the GEV shape parameter was 0.0499. The computed first, second, and third quartiles and mean of the at-site estimates for the GEV shape parameter were −0.04892, 0.20571, 0.28373, and 0.09316, respectively.
The probability plots shown in Figure 11 further summarize the quality of the fit for the spatial covariate model for modelling the marginal parameters throughout the entire river network, T . The computed Akaike information criterion (AIC) value for the 13-parameter spatial covariate model (AIC = 24,374.22) was slightly greater than the AIC value obtained for the 66-parameter “at-site” model (AIC = 24,223.48). The spatial covariate model can be applied to estimate pointwise return levels for any location in T .
Table 6 and Table 7 summarize the results from 1500 CV supervised elastic net optimization runs that were performed to examine the marginal trend fitting method’s sensitivity for covariate selection. For each marginal distribution parameter, the most parsimonious model was always obtained using leave-one-out CV (nfolds = 22; Table 6). While increasing the number of folds provided more stability with respect to covariate selection (Table 6), the CV runs with fewer folds introduced opportunities for a greater fit (Table 7), albeit with potentially more complex models. For example, the most predictive models, defined to be the ones with the greatest NSE/KGE/R2 from among the 100 CV optimization runs performed with nfolds set equal to 3, were examined for the GEV location, scale, and shape parameters. The most regularized model, defined to be at the largest value of λ (Equation (3)) within one standard error of the minimum [15], with the greatest NSE/KGE/R2 from among the 100 CV runs for the GEV location and scale parameters were of dimensions 11 and 8, respectively. The best-fitting minimum error model [15] for the GEV shape parameter was of dimension 10. By contrast, the comparable leave one out CV models were of dimensions 8, 4, and 1, for the GEV location, scale, and shape parameters, respectively (nfolds = 22; Table 6).
Spatial models were fit, using Equation (1), for these two potential covariate models of sizes 29 and 13, respectively. The AIC value associated with the fitted spatial model with 29 covariates was 25,651.5, whereas the spatial covariate model parameterized using the leave one out CV-directed elastic net results yielded a lower AIC value of 24,374.22. One possible explanation for the higher AIC value associated with the more complex spatial covariate model was the observation that the CV-directed elastic net optimization runs could not identify a predictive covariate model for the GEV shape parameter as measured by the computed and reported NSE values (Table 7). The maximum reported NSE value for the GEV shape parameter was less than zero [50]. These results for the GEV shape parameter, obtained from a comprehensive CV-directed elastic net analysis, provided support to the assumption to treat it as a constant given the physiographic and climatological homogeneity that was observed throughout the study area. It is also worth mentioning that simple synthetic numerical experiments involving the simulation of series of block maxima from GEV(0,1,0), wherein the true shape parameter is zero, with lengths equal in value to the number of seasonal maxima reported in Table 3, can be performed to demonstrate that the variability of the at-site shape parameter estimates reported in Table 5 are not necessarily inconsistent with an assumed constant shape parameter.
This study evaluated a novel covariate selection procedure within the framework of a unique MSP for modelling flood extremes on a river network [22,31]. One advantage of the marginal distribution fitting method introduced in this study is that it is applicable for alternative methods, such as Bayesian hierarchical modelling, to spatially model flood frequency [23,24,25,26,27,28,29].

4.2. Modelling Spatial Dependence

Beyond a strong dependence among all site pairs, it is difficult to identify any observable pattern with the extremal coefficient estimates plotted versus Euclidean distance in Figure 12. This contrasts with comparable plots typically obtained for extreme precipitation, SWE, temperature, and wind data [6,7,8].
Each of the fitted MSP model’s log-likelihood values indicated improved fits when using the distance metric proposed by Asadi et al. [22], as opposed to Euclidean distance. In addition, they demonstrated that accounting for flow-connected dependence and anisotropy further improved model fit.
The curve of the data plotted in Figure 13a closely resembles what is commonly observed with dependence model summaries for MSP applications with extreme precipitation and SWE data [6,7,8]. The plot also shows that flow-connected sites at the same distance can have different extremal coefficients depending on their location in T . Further, in general, dependence is stronger among sites that are flow-connected. Figure 13b is a plot that compares the empirical and model-based extremal coefficient estimates. While based on a limited dataset, the observed agreement is reasonable. Flood discharge exceedances can be estimated by simulating the fitted MSP that models extreme discharge dependence on the river network and subsequently transforming the simulated values using the results from the estimated marginal distributions.

5. Conclusions

Modelling extremes using an MSP involves two distinct steps, trend surface fitting and modelling the inter-site extremal dependence, with each step assuming independence among the extremes and fixed margins, respectively. In this study, each step was applied to discharge data from 22 streamflow gaging stations located in the Current, Little Black, Eleven Point, Spring, and Strawberry River basins in Missouri and Arkansas. The methodology utilized here was based on a unique MSP approach specifically designed for analysis of streamflow extremes. We expanded upon the novel approach by considering a larger suite of covariates and a novel automatic covariate selection procedure. The first step involved applications of the elastic-net penalty to automatically select covariate models for the marginal distribution’s location and scale parameters from among a set of 16 potential covariates representing morphometric data associated with each gaging station’s contributing drainage basin. The spatial covariate model required 13 parameters, whereas the “at-site” model involved 66 parameter values. While the computed AIC value for the spatial covariate model was slightly greater than the AIC value obtained for the 66-parameter “at-site” model, the spatial covariate model could be applied to estimate pointwise return levels for any location throughout the river network, whereas the “at-site” model was only applicable at the study’s 22 streamflow gaging station sites (Table 5). Application of the dependence model that involved river distance and hydrologic distance rather than Euclidean distance resulted in a better fitting model of the extreme flood data, with a dependence summary that more closely resembled what is commonly observed for dependence summaries from MSP models for extreme precipitation and SWE data. Flood exceedances can be estimated throughout the entire river network using results from applications of these two steps.
There was a moderate degree of homogeneity with respect to climate, morphometry, and physiography for the five basins whose river networks were modelled in this study. In addition, the size of the discharge dataset was somewhat limited, particularly for the dependence modelling. Further related study, focusing on larger datasets and more heterogeneous systems, possibly using modified flows datasets, for example, is recommended. Adapting the likelihood for marginal fitting to combine the systematic discharge records with temporal and causal information expansion data types could be another direction to explore to potentially expand upon the capabilities of the USACE’s BestFit flood frequency analysis tool.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/geohazards4040030/s1, Figure S1: Mean Daily Discharge Data at each of the 22 stations; Figure S2: April–November Mean Daily Discharge Data at each of the 22 stations; Figure S3: Annual maxima of the seasonal (April–November) mean daily discharge data at each of the 22 stations; Figure S4: Threshold choice plots (Location, Scale, Shape); Table S1: Covariate values.

Author Contributions

Conceptualization, B.S., C.H.S. and B.T.R.; methodology, B.S., C.H.S. and B.T.R.; formal analysis, B.S.; investigation, B.S., C.H.S. and B.T.R.; writing—original draft preparation, B.S.; writing—review and editing, B.S., C.H.S. and B.T.R.; funding acquisition, B.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the U.S. Army Corps of Engineers Mississippi River Geomorphology and Potamology Program.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors thank the three reviewers for their comments which improved this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. U.S. Department of the Interior Bureau of Reclamation; U.S. Army Corps of Engineers. Best Practices in Dam and Levee Safety Risk Analysis; U.S. Department of the Interior Bureau of Reclamation: Washington, DC, USA; U.S. Army Corps of Engineers: Lakewood, CO, USA, 2019. Available online: https://www.usbr.gov/damsafety/risk/methodology.html (accessed on 16 May 2023).
  2. Smith, C.H.; Bartles, M.; Fleming, M. An Inflow Volume-Based Approach to Estimating Stage-Frequency Curves for Dams. In U.S. Army Corps of Engineers Risk Management Center Technical Report RMC-TR-2018-03; U.S. Army Corps of Engineers Risk Management Center: Lakewood, CO, USA, 2018; Available online: https://publibrary.planusace.us/document/87363a2a-8dd9-4596-991e-2f9863815c7e (accessed on 16 May 2023).
  3. Merz, R.; Blöschl, G. Flood frequency hydrology: 1. Temporal, spatial, and causal expansion of information. Water Resour. Res. 2008, 44, W08432. [Google Scholar] [CrossRef]
  4. Merz, R.; Blöschl, G. Flood frequency hydrology: 2. Combining data evidence. Water Resour. Res. 2008, 44, W08433. [Google Scholar] [CrossRef]
  5. Viglione, A.; Merz, R.; Salinas, J.L.; Blöschl, G. Flood frequency hydrology: 3. A Bayesian analysis. Water Resour. Res. 2013, 49, 675–692. [Google Scholar] [CrossRef]
  6. Skahill, B.; Smith, C.H.; Russell, B.T.; England, J.F. Impacts of Max-Stable Process Areal Exceedance Calculations to Study Area Sampling Density, Surface Network Precipitation Gage Extent and Density, and Model Fitting Method. Hydrology 2023, 10, 121. [Google Scholar] [CrossRef]
  7. Skahill, B.E.; Duren, A.M.; Cunha, L.; Bahner, C. Spatial Analysis of Precipitation and Snow Water Equivalent Extremes for the Columbia River Basin. In U.S. Army Engineer Research and Development Center Coastal and Hydraulics Laboratory Technical Report TR-20-10; U.S. Army Engineer Research and Development Center Coastal and Hydraulics Laboratory: Vicksburg, MS, USA, 2020. [Google Scholar] [CrossRef]
  8. Skahill, B.E.; Kanney, J.; Carr, M. Analysis of snow water equivalent annual maxima in the Upper Connecticut River Basin using a max-stable spatial process model. In U.S. Army Engineer Research and Development Center Coastal and Hydraulics Laboratory Technical Report TR-20-7; U.S. Army Engineer Research and Development Center Coastal and Hydraulics Laboratory: Vicksburg, MS, USA, 2020. [Google Scholar] [CrossRef]
  9. Ribatet, M.; Dombry, C.; Oesting, M. Spatial Extremes and Max-Stable Processes. In Extreme Value Modeling and Risk Analysis: Methods and Applications; Dey, D.K., Yan, J., Eds.; Chapman and Hall/CRC: New York, NY, USA, 2015; Chapter 9. [Google Scholar] [CrossRef]
  10. Ribatet, M. Modelling Spatial Extremes Using Max-Stable Processes. In Nonlinear and Stochastic Climate Dynamics; Franzke, C., O’Kane, T., Eds.; Cambridge University Press: Cambridge, UK, 2017; pp. 369–391. [Google Scholar] [CrossRef]
  11. Ribatet, M. Spatial Extremes: Max-Stable Processes at Work. J. Société Française De Stat. (Spec. Ed. Extrem. Value Theory) 2013, 154, 156–177. Available online: http://www.numdam.org/item/JSFS_2013__154_2_156_0/ (accessed on 16 May 2023).
  12. Davison, A.; Padoan, S.; Ribatet, M. Statistical Modelling of Spatial Extremes. Stat. Sci. 2012, 27, 161–186. [Google Scholar] [CrossRef]
  13. Cooley, D.; Cisewski, J.; Erhardt, R.; Jeon, S.; Mannshardt, E.; Omolo, B.; Sun, Y. A Survey of Spatial Extremes: Measuring Spatial Dependence and Modeling Spatial Effects. RevStat 2012, 10, 135–165. [Google Scholar] [CrossRef]
  14. Wright, D.B.; Smith, J.A.; Baeck, M.L. Critical Examination of Area Reduction Factors. J. Hydrol. Eng. 2014, 19, 769–776. [Google Scholar] [CrossRef]
  15. Friedman, J.; Hastie, T.; Tibshirani, R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J. Stat. Softw. 2010, 33, 1–22. [Google Scholar] [CrossRef]
  16. Simon, N.; Friedman, J.; Hastie, T.; Tibshirani, R. Regularization Paths for Cox’s Proportional Hazards Model via Coordinate Descent. J. Stat. Softw. 2011, 39, 1–13. [Google Scholar] [CrossRef]
  17. Tibshirani, R.; Bien, J.; Friedman, J.; Hastie, T.; Simon, N.; Taylor, J.; Tibshirani, R.J. Strong rules for discarding predictors in lasso-type problems. J. R. Stat. Soc. Ser. B Stat. Methodol. 2012, 74, 245–266. [Google Scholar] [CrossRef] [PubMed]
  18. Davison, A.C.; Gholamrezaee, M.M. Geostatistics of extremes. Proc. R. Soc. A 2011, 468, 581–608. [Google Scholar] [CrossRef]
  19. Cooley, D.; Naveau, P.; Poncet, P. Variograms for spatial max-stable random fields. In Dependence in Probability and Statistics; Lecture Notes in Statistics; Bertail, P., Soulier, P., Doukhan, P., Eds.; Springer: New York, NY, USA, 2006; Volume 187. [Google Scholar] [CrossRef]
  20. Engelke, S.; Malinowski, A.; Kabluchko, Z.; Schlather, M. Estimation of Hüsler–Reiss distributions and Brown–Resnick processes. J. R. Stat. Soc. Ser. B Stat. Methodol. 2015, 77, 239–265. [Google Scholar] [CrossRef]
  21. Huser, R.; Davison, A.C. Space–time modelling of extreme events. J. R. Stat. Soc. Ser. B Stat. Methodol. 2014, 76, 439–461. [Google Scholar] [CrossRef]
  22. Asadi, P.; Davison, A.C.; Engelke, S. Extremes on river networks. Ann. Appl. Stat. 2015, 9, 2023–2050. [Google Scholar] [CrossRef]
  23. Najafi, M.R.; Moradkhani, H. Analysis of runoff extremes using spatial hierarchical Bayesian modeling. Water Resour. Res. 2013, 49, 6656–6670. [Google Scholar] [CrossRef]
  24. Yan, H.; Moradkhani, H. A regional Bayesian hierarchical model for flood frequency analysis. Stoch. Environ. Res. Risk Assess. 2015, 29, 1019–1036. [Google Scholar] [CrossRef]
  25. Yan, H.; Moradkhani, H. Toward more robust extreme flood prediction by Bayesian hierarchical and multimodeling. Nat. Hazards 2016, 81, 203–225. [Google Scholar] [CrossRef]
  26. Bracken, C.; Holman, K.D.; Rajagopalan, B.; Moradkhani, H. A Bayesian hierarchical approach to multivariate nonstationary hydrologic frequency analysis. Water Resour. Res. 2018, 54, 243–255. [Google Scholar] [CrossRef]
  27. Wu, Y.; Lall, U.; Lima, C.H.R.; Zhong, P.-A. Local and regional flood frequency analysis based on hierarchical Bayesian model: Application to annual maximum streamflow for the Huaihe River basin. Hydrol. Earth Syst. Sci. Discuss. 2018, 1–21, (preprint, in review). [Google Scholar] [CrossRef]
  28. Wu, Y.; Xue, L.; Liu, Y. Local and regional flood frequency analysis based on hierarchical Bayesian model in Dongting Lake Basin, China. Water Sci. Eng. 2019, 12, 253–262. [Google Scholar] [CrossRef]
  29. Sampaio, J.; Costa, V. Bayesian regional flood frequency analysis with GEV hierarchical models under spatial dependency structures. Hydrol. Sci. J. 2021, 66, 422–433. [Google Scholar] [CrossRef]
  30. Smith, H.; Doughty, M. RMC-BestFit Quick Start Guide; U.S. Army Corps of Engineers Risk Management Center Technical Report RMC-TR-2020-03; U.S. Army Corps of Engineers Risk Management Center: Lakewood, CO, USA, 2020; Available online: https://www.iwrlibrary.us/#/document/f1767e9f-714d-43b7-cf74-ed1bd65f9dd9 (accessed on 16 May 2023).
  31. Love, C.A.; Skahill, B.E.; Russell, B.T.; Baggett, J.S.; AghaKouchak, A. An Effective Trend Surface Fitting Framework for Spatial Analysis of Extreme Events. Geophys. Res. Lett. 2022, 49, e2022GL098132. [Google Scholar] [CrossRef]
  32. Adamski, J.C.; Petersen, J.C.; Freiwald, D.A.; Davis, J.V. Environmental and Hydrologic Setting of the Ozark Plateaus Study Unit, Arkansas, Kansas, Missouri, and Oklahoma. In U.S. Geological Survey Water-Resources Investigations Report 94-4022; National Water-Quality Assessment Program; U.S. Geological Survey Water-Resources: Little Rock, AR, USA, 1995. [Google Scholar]
  33. Wilkerson, T.F., Jr. Current River: Watershed and Inventory Assessment; Missouri Department of Conservation: West Plains, MO, USA, 2003. [Google Scholar]
  34. Miller, S.M.; Wilkerson, T.F., Jr. Eleven Point River: Watershed and Inventory Assessment; Missouri Department of Conservation: West Plains, MO, USA, 2000. [Google Scholar]
  35. Daly, C.; Halbleib, M.; Smith, J.I.; Gibson, W.P.; Doggett, M.K.; Taylor, G.H.; Curtis, J.; Pasteris, P.P. Physiographically Sensitive Mapping of Climatological Temperature and Precipitation across the Conterminous United States. Int. J. Climatol. 2008, 28, 2031–2064. [Google Scholar] [CrossRef]
  36. Alexander, T.W.; Wilson, G.L. Technique for Estimating the 2-to 500-Year Flood Discharges on Unregulated Streams in Rural Missouri. In Water Resources Investigations Report 95-4231; United States Geological Survey: Rolla, MI, USA, 1995. [Google Scholar]
  37. Southard, R.E.; Veilleux, A.G. Methods for estimating annual exceedance-probability discharges and largest recorded floods for unregulated streams in rural Missouri. In U.S. Geological Survey Scientific Investigations Report 2014–5165; United States Geological Survey: Rolla, MI, USA, 2014; 39p. [Google Scholar] [CrossRef]
  38. Zăvoianu, I. Morphometry of Drainage Basins. In Developments in Water Science; Elsevier: New York, NY, USA, 1985. [Google Scholar]
  39. Coles, S. An Introduction to Statistical Modeling of Extreme Values; Springer: London, UK, 2001. [Google Scholar]
  40. Blanchet, J. Max-stable processes and annual maximum snow depth. In Proceedings of the 6th International Conference on Extreme Value Analysis, Fort Collins, CO, USA, 23–26 June 2009; Available online: https://www.stat.colostate.edu/graybillconference2009/Presentations/Blanchet.pdf (accessed on 16 May 2023).
  41. Zou, H.; Hastie, T. Regularization and Variable Selection via the Elastic Net. J. R. Stat. Soc. B 2005, 67, 301–320. [Google Scholar] [CrossRef]
  42. Hoerl, A.E.; Kennard, R.W. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
  43. Tikhonov, A.N. On the Stability of Inverse Problems. Dokl. Akad. Nauk SSSR 1943, 39, 195–198. [Google Scholar]
  44. Tibshirani, R. Regression Shrinkage and Selection via the Lasso. J. Royal. Statist. Soc. B. 1996, 58, 267–288. [Google Scholar] [CrossRef]
  45. Gareth, J.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning: With Applications in R; Springer: New York, NY, USA, 2013. [Google Scholar]
  46. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Prediction, Inference and Data Mining, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar]
  47. Economou, T.; Stephenson, D.B.; Ferro, C.A.T. Spatio-temporal modelling of extreme storms. Ann. Appl. Stat. 2014, 8, 2223–2246. [Google Scholar] [CrossRef]
  48. Ver Hoef, J.M.; Peterson, E.; Theobald, D. Spatial statistical models that use flow and stream distance. Environ. Ecol. Stat. 2006, 13, 449–464. [Google Scholar] [CrossRef]
  49. Ver Hoef, J.M.; Peterson, E.E. A moving average approach for spatial statistical models of stream networks. J. Amer. Statist. Assoc. 2010, 105, 6–18. [Google Scholar] [CrossRef]
  50. Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I—A discussion of principles. J. Hydrol. 1970, 10, 282–290. [Google Scholar] [CrossRef]
  51. Gupta, H.V.; Kling, H.; Yilmaz, K.K.; Martinez, G.F. Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. J. Hydrol. 2009, 377, 80–91. [Google Scholar] [CrossRef]
Figure 1. (a) Study area, consisting of the Current, Eleven Point, Little Black, Spring, and Strawberry River basins in Missouri and Arkansas, including the locations of 22 USGS streamflow gaging stations. (b) Relative location of the study area within the Ozark Plateaus. For each plot, the horizontal axis is in degrees longitude and the vertical axis is in degrees latitude.
Figure 1. (a) Study area, consisting of the Current, Eleven Point, Little Black, Spring, and Strawberry River basins in Missouri and Arkansas, including the locations of 22 USGS streamflow gaging stations. (b) Relative location of the study area within the Ozark Plateaus. For each plot, the horizontal axis is in degrees longitude and the vertical axis is in degrees latitude.
Geohazards 04 00030 g001
Figure 2. Mean monthly precipitation (units in mm) for the Current, Eleven Point, Little Black, Spring, and Strawberry River basins, computed using the gridded PRISM monthly climate dataset representative of the period 1981–2010.
Figure 2. Mean monthly precipitation (units in mm) for the Current, Eleven Point, Little Black, Spring, and Strawberry River basins, computed using the gridded PRISM monthly climate dataset representative of the period 1981–2010.
Geohazards 04 00030 g002
Figure 3. (a) Mean annual precipitation (values in mm) (a) throughout the Ozark Plateaus and (b) for the Current, Eleven Point, Little Black, Spring, and Strawberry River basins, delineated. The mean annual precipitation values were computed using the PRISM monthly climate data set representative of the period 1981–2010. For each plot, the horizontal axis is in degrees longitude and the vertical axis is in degrees latitude.
Figure 3. (a) Mean annual precipitation (values in mm) (a) throughout the Ozark Plateaus and (b) for the Current, Eleven Point, Little Black, Spring, and Strawberry River basins, delineated. The mean annual precipitation values were computed using the PRISM monthly climate data set representative of the period 1981–2010. For each plot, the horizontal axis is in degrees longitude and the vertical axis is in degrees latitude.
Geohazards 04 00030 g003
Figure 4. Mean monthly mean and minimum temperature (units in °C) for the Current, Eleven Point, Little Black, Spring, and Strawberry River basins, computed using the gridded PRISM monthly climate dataset representative of the period 1981–2010.
Figure 4. Mean monthly mean and minimum temperature (units in °C) for the Current, Eleven Point, Little Black, Spring, and Strawberry River basins, computed using the gridded PRISM monthly climate dataset representative of the period 1981–2010.
Geohazards 04 00030 g004
Figure 5. Elevation values by basin (units in meters (m)). (a) All five modelled basins; (b) Current River Basin; (c) Little Black River Basin; (d) Eleven Point River Basin; (e) Spring River Basin; (f) Strawberry River Basin. One-arc-second resolution raster data from the USGS 3D elevation program was the source of the elevation values. For each plot, the horizontal axis is in degrees longitude and the vertical axis is in degrees latitude.
Figure 5. Elevation values by basin (units in meters (m)). (a) All five modelled basins; (b) Current River Basin; (c) Little Black River Basin; (d) Eleven Point River Basin; (e) Spring River Basin; (f) Strawberry River Basin. One-arc-second resolution raster data from the USGS 3D elevation program was the source of the elevation values. For each plot, the horizontal axis is in degrees longitude and the vertical axis is in degrees latitude.
Geohazards 04 00030 g005aGeohazards 04 00030 g005b
Figure 6. Slopes by basin (in degrees). (a) All five modelled basins; (b) Current River Basin; (c) Little Black River Basin; (d) Eleven Point River Basin; (e) Spring River Basin; (f) Strawberry River Basin. One-arc-second resolution raster data from the USGS 3D elevation program was the source of the elevation values that were used to compute the slopes. For each plot, the horizontal axis is in degrees longitude and the vertical axis is in degrees latitude.
Figure 6. Slopes by basin (in degrees). (a) All five modelled basins; (b) Current River Basin; (c) Little Black River Basin; (d) Eleven Point River Basin; (e) Spring River Basin; (f) Strawberry River Basin. One-arc-second resolution raster data from the USGS 3D elevation program was the source of the elevation values that were used to compute the slopes. For each plot, the horizontal axis is in degrees longitude and the vertical axis is in degrees latitude.
Geohazards 04 00030 g006aGeohazards 04 00030 g006b
Figure 7. (a) Hypsometric curves and (b) surface slopes for the Current, Eleven Point, Little Black, Spring, and Strawberry River basins, computed using one-arc-second resolution raster data from the USGS 3D elevation program.
Figure 7. (a) Hypsometric curves and (b) surface slopes for the Current, Eleven Point, Little Black, Spring, and Strawberry River basins, computed using one-arc-second resolution raster data from the USGS 3D elevation program.
Geohazards 04 00030 g007
Figure 8. A schematic diagram that presents the main elements of the marginal distribution fitting method.
Figure 8. A schematic diagram that presents the main elements of the marginal distribution fitting method.
Geohazards 04 00030 g008
Figure 9. Elastic-net cross-validation (CV) plots for the study area that summarize the results for (a) μ (cov μ )) and (b) σ (cov σ )), identifying trend surface covariates for the GEV location and scale parameters. Each model used the data listed in Table 5, the set of covariate values (standardized) listed in Table S1, and weights which were assigned in accordance with the number of exceedances at each station. The elastic-net CV simulations considerd 16 total covariables. The x-axis is the natural logarithm of λ, the y-axis is the mean squared error (MSE), the top of the plot indicates the number of non-zero covariates as λ varies, the red markers are the CV-derived MSE with error bars indicating one standard error, and the dotted vertical lines indicate the locations of the CV-identified λ-value that minimizes the MSE (λmin) and identifies the defined most regularized model (λreg) [15].
Figure 9. Elastic-net cross-validation (CV) plots for the study area that summarize the results for (a) μ (cov μ )) and (b) σ (cov σ )), identifying trend surface covariates for the GEV location and scale parameters. Each model used the data listed in Table 5, the set of covariate values (standardized) listed in Table S1, and weights which were assigned in accordance with the number of exceedances at each station. The elastic-net CV simulations considerd 16 total covariables. The x-axis is the natural logarithm of λ, the y-axis is the mean squared error (MSE), the top of the plot indicates the number of non-zero covariates as λ varies, the red markers are the CV-derived MSE with error bars indicating one standard error, and the dotted vertical lines indicate the locations of the CV-identified λ-value that minimizes the MSE (λmin) and identifies the defined most regularized model (λreg) [15].
Geohazards 04 00030 g009
Figure 10. Comparisons of the trend surface models for the GEV location and scale parameters, using the spatial covariates identified from application of the elastic-net penalty, with their corresponding at-site estimates at the study’s 22 streamflow gaging stations whose locations are shown, by ID, in Figure 1. In each case, results were obtained from application of Equation (1), with and without covariates, respectively.
Figure 10. Comparisons of the trend surface models for the GEV location and scale parameters, using the spatial covariates identified from application of the elastic-net penalty, with their corresponding at-site estimates at the study’s 22 streamflow gaging stations whose locations are shown, by ID, in Figure 1. In each case, results were obtained from application of Equation (1), with and without covariates, respectively.
Geohazards 04 00030 g010
Figure 11. Probability plots for each of the study’s 22 daily streamflow gaging stations whose locations are shown, by ID, in Figure 1. For each site, results were obtained from application of Equation (1), without and with covariates, respectively (– = 95% confidence intervals).
Figure 11. Probability plots for each of the study’s 22 daily streamflow gaging stations whose locations are shown, by ID, in Figure 1. For each site, results were obtained from application of Equation (1), without and with covariates, respectively (– = 95% confidence intervals).
Geohazards 04 00030 g011aGeohazards 04 00030 g011bGeohazards 04 00030 g011c
Figure 12. Extremal coefficients (estimated using the madogram) of all pairs of gaging stations plotted against Euclidean distance; those for flow-connected pairs are blue crosses, and those for flow-unconnected pairs are black circles.
Figure 12. Extremal coefficients (estimated using the madogram) of all pairs of gaging stations plotted against Euclidean distance; those for flow-connected pairs are blue crosses, and those for flow-unconnected pairs are black circles.
Geohazards 04 00030 g012
Figure 13. Fitted MSP model-based extremal coefficient estimates obtained using Equation (4) (a) plotted against hydrological distance and (b) madogram-based estimates. Blue crosses denote flow-connected pairs and black circles signify flow-unconnected pairs.
Figure 13. Fitted MSP model-based extremal coefficient estimates obtained using Equation (4) (a) plotted against hydrological distance and (b) madogram-based estimates. Blue crosses denote flow-connected pairs and black circles signify flow-unconnected pairs.
Geohazards 04 00030 g013
Table 1. Summary statistics of elevations and slopes by basin (units, meters (m) and degrees, respectively; Min. = minimum; Max. = maximum).
Table 1. Summary statistics of elevations and slopes by basin (units, meters (m) and degrees, respectively; Min. = minimum; Max. = maximum).
Elevations (m)Basin Slopes (Degrees)
Min.25%50%75%Max.Min.25%50%75%Max.
Current101.80241.50308.00359.61490.8203.636.7011.1655.18
Little Black91.81144.52176.10201.13268.3501.723.255.3532.81
Eleven Point91.15212.04266.58310.45465.5902.514.196.6548.03
Spring79.25183.92227.03271.67383.3502.574.426.8743.04
Strawberry90.89176.54205.98231.41309.6702.293.735.6738.81
Table 2. Summary information for the study’s 22 daily streamflow gaging stations whose locations are shown, by ID, in Figure 1 (dd = decimal degrees, # = number).
Table 2. Summary information for the study’s 22 daily streamflow gaging stations whose locations are shown, by ID, in Figure 1 (dd = decimal degrees, # = number).
IDUSGS Station NumberLocationUpstream Drainage AreaPeriod of RecordMissing Data
LongitudeLatitudeElevation
ddddmkm2Begin DateEnd Date# of Days
017064300−91.7370537.53023354.79774.4510/01/195609/30/19760
027064440−91.6711137.44833274.1449152.2902/07/200701/18/20211
037064500−91.8500337.23291366.154521.6506/01/194910/15/19750
047064533−91.5528137.37569239.8535764.0508/14/200101/18/20210
057065200−91.6680637.05611257.621479.1510/01/200101/18/20210
067065495−91.4430837.14817200.9399771.8203/25/199301/18/20210
077066000−91.3581737.15408195.0181030.8211/01/192101/18/20210
087066500−91.2583337.18389174.34433294.4708/24/192103/18/19760
097067000−91.0135036.99139136.44414317.5106/18/192101/18/20210
107068000−90.8475036.62194113.44985278.4006/14/192101/18/20210
117069220−91.5266736.46028137.6679725.2003/17/198810/04/20165665
127069295−91.6336136.35222150.2549686.3503/19/201001/18/20215
137069305−91.4827836.31361110.22372188.5410/01/200101/18/20217
147069500−91.1716736.2055679.493113056.1904/01/193601/18/20212757
157070000−91.9275836.97039358.605212.7209/01/195509/30/19670
167070500−91.4919436.78472184.0703934.9910/01/195011/09/19760
177071500−91.2008336.64869131.19812053.8610/01/192101/18/20210
187072000−91.1141736.3463993.55272926.6910/01/192901/18/20212728
197073000−91.6083336.09889128.1738562.0303/01/193910/17/19790
207073500−91.6108336.08056129.4728256.9303/01/193901/30/198531
217074000−91.4494436.11111105.73021225.0704/01/193609/30/20042339
227068510−90.5752836.6316791.96727502.4605/15/198001/18/20217464
Table 3. The begin date, end date, number of missing April–November mean daily data values and number of seasonal annual maxima for each of the study’s 22 daily streamflow gaging stations whose locations are shown, by ID, in Figure 1 (# = number).
Table 3. The begin date, end date, number of missing April–November mean daily data values and number of seasonal annual maxima for each of the study’s 22 daily streamflow gaging stations whose locations are shown, by ID, in Figure 1 (# = number).
IDUSGS Station NumberApril–November Period of RecordMissing Data# of Seasonal Maxima
Begin DateEnd Date# of Days
01706430004/01/195711/30/1975019
02706444004/01/200711/30/2020014
03706450004/01/195011/30/1974025
04706453304/01/200211/30/2020019
05706520004/01/200211/30/2020019
06706549504/01/199311/30/2020028
07706600004/01/192211/30/2020099
08706650004/01/192211/30/1975054
09706700004/01/192211/30/2020099
10706800004/01/192211/30/2020099
11706922004/01/198811/30/2015372228
12706929504/01/201011/30/2020511
13706930504/01/200211/30/2020019
14706950004/01/193611/30/2020188885
15707000004/01/195611/30/1966011
16707050004/01/195111/30/1975025
17707150004/01/192211/30/2020099
18707200004/01/193011/30/2020185591
19707300004/01/193911/30/1978040
20707350004/01/193911/30/1984046
21707400004/01/193611/30/2003160768
22706851004/01/198111/30/2020494140
Table 4. The 16 covariates that were used for marginal fitting. Covariate values were readily computed for each gaging station’s contributing drainage area using a geographic information system and a one-arc-second resolution raster digital elevation model dataset from the USGS 3D elevation program.
Table 4. The 16 covariates that were used for marginal fitting. Covariate values were readily computed for each gaging station’s contributing drainage area using a geographic information system and a one-arc-second resolution raster digital elevation model dataset from the USGS 3D elevation program.
Covariate
Outlet x coordinate
Outlet y coordinate
Outlet elevation
Centroid x coordinate
Centroid y coordinate
Centroid elevation
Area
Perimeter
Average length
Average width
Mean elevation
Minimum elevation
Maximum elevation
Elevation range
Mean land surface slope
Median land surface slope
Table 5. GEV model parameter estimates obtained from application of Equation (1), using the independent storm events that were extracted from the seasonal mean daily discharge data, for each of the study’s 22 streamflow gaging station sites whose locations are shown, by ID, in Figure 1. Each set of GEV parameter estimates is only applicable at its respective gaging site.
Table 5. GEV model parameter estimates obtained from application of Equation (1), using the independent storm events that were extracted from the seasonal mean daily discharge data, for each of the study’s 22 streamflow gaging station sites whose locations are shown, by ID, in Figure 1. Each set of GEV parameter estimates is only applicable at its respective gaging site.
GEV
IDUSGS Station NumberLocationScaleShape
170643007.830706.5937390.203143061
27064440651.81687464.1756090.344217689
37064500201.25654228.860738−0.144267974
470645332220.885802732.2799870.233906106
570652003502.688454008.151359−0.195715630
670654953514.903393934.0481800.296018442
770660003685.562583657.1324410.246881711
870665008518.983319928.801022−0.002357151
9706700010,000.0003413,184.6214140.107549044
10706800012,469.2056512,111.6062560.208479773
117069220365.827572372.683454−0.018406653
1270692952805.651991452.7319420.669369529
1370693056590.521644115.8792610.753694210
1470695008379.455158044.7921430.308929673
15707000051.5665974.159390−1.150904107
1670705001096.323592370.342979−0.243435530
1770715003217.669163532.2827010.359971090
1870720004238.212015571.9424380.244704406
1970730001488.634066172.408812−0.460305499
2070735001307.227771423.849136−0.059088878
2170740004722.169305034.2951290.049622116
2270685101566.406361946.9573830.208274893
Table 6. A summary of 1500 nfold cross validation directed elastic net optimization runs with α fixed close in value to one (Equation (3)). For each GEV distribution parameter (location, scale, shape) and fold value (3, 9, 11, 15, 22), 100 runs were performed. Leave-one-out cross validation equates with the case when the number of folds equaled 22. The table summarizes the count associated with each covariate for each set of 100 runs.
Table 6. A summary of 1500 nfold cross validation directed elastic net optimization runs with α fixed close in value to one (Equation (3)). For each GEV distribution parameter (location, scale, shape) and fold value (3, 9, 11, 15, 22), 100 runs were performed. Leave-one-out cross validation equates with the case when the number of folds equaled 22. The table summarizes the count associated with each covariate for each set of 100 runs.
GEV LocationGEV ScaleGEV Shape
Number of Folds
Covariate391115223911152239111522
Intercept100100100100100100100100100100100100100100100
Outlet x coordinate000001000062313120
Outlet y coordinate300001000075330
Outlet elevation346066771001000000000
Centroid x coordinate769910010010070000101319140
Centroid y coordinate000002000095330
Centroid elevation961001001001000000010110
Area10010010010010010010010010010020110
Perimeter600000000000000
Average length200000000000000
Average width10010010010010010010010010010010000
Mean elevation000001000010000
Minimum elevation0000000000384541340
Maximum elevation5000028545000000
Elevation range699910010010080000414541340
Mean land surface slope000002000030110
Median land surface slope1001001001001009110010010010055220
Table 7. Summary statistics associated with 1500 nfold cross validation directed elastic net optimization runs with α fixed close in value to one (Equation (3); Table 6). For each GEV distribution parameter (location, scale, shape) and nfold value (3, 9, 11, 15, 22), 100 runs were performed. Leave-one-out cross validation equated with the case when the number of folds equaled 22. The summary statistics were computed using each fitted elastic net model, in particular, the most regularized model [15] for GEV location and scale, and the best-fitting minimum error model [15] for GEV shape (NSE = Nash–Sutcliffe efficiency [50]; KGE = Kling–Gupta efficiency [51]; Min. = minimum; Max. = maximum; St. dev. = standard deviation; NA = not available).
Table 7. Summary statistics associated with 1500 nfold cross validation directed elastic net optimization runs with α fixed close in value to one (Equation (3); Table 6). For each GEV distribution parameter (location, scale, shape) and nfold value (3, 9, 11, 15, 22), 100 runs were performed. Leave-one-out cross validation equated with the case when the number of folds equaled 22. The summary statistics were computed using each fitted elastic net model, in particular, the most regularized model [15] for GEV location and scale, and the best-fitting minimum error model [15] for GEV shape (NSE = Nash–Sutcliffe efficiency [50]; KGE = Kling–Gupta efficiency [51]; Min. = minimum; Max. = maximum; St. dev. = standard deviation; NA = not available).
GEV LocationGEV ScaleGEV Shape
nfolds
391115223911152239111522
NSEMax.0.9800.9780.9780.9770.9760.9190.8980.8980.8960.685−0.322−7.664−1.683−1.683NA
Min.0.7370.9310.9520.9560.976−0.3240.4650.5520.5860.685−41,934.587−263.274−400.794−400.794NA
Mean0.9490.9700.9710.9730.9760.6550.6910.6970.6980.685−1181.861−30.333−35.422−35.615NA
St. dev.0.0440.0080.0070.0050.0000.2510.0670.0570.0540.0006547.94837.48561.57065.879NA
R2Max.0.9810.9790.9790.9790.9780.9220.9090.9090.9080.8630.5930.3520.5410.541NA
Min.0.9060.9460.9590.9620.9780.8410.8510.8560.8580.8630.1550.1760.1650.165NA
Mean0.9620.9730.9740.9760.9780.8720.8650.8650.8660.8630.2300.2140.2160.218NA
St. dev.0.0180.0060.0060.0040.0000.0240.0090.0090.0100.0000.0960.0410.0610.066NA
KGEMax.0.9780.9720.9720.9710.9690.9510.9240.9240.9220.6500.189−1.459−0.275−0.275NA
Min.0.6490.9010.9340.9380.9690.0500.4700.5330.5600.650−200.561−14.500−18.227−18.227NA
Mean0.9300.9590.9610.9640.9690.6680.6620.6660.6670.650−11.682−3.635−3.761−3.739NA
St. dev.0.0610.0130.0110.0090.0000.2150.0750.0680.0660.00031.6501.9522.7432.827NA
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Skahill, B.; Smith, C.H.; Russell, B.T. Marginal Distribution Fitting Method for Modelling Flood Extremes on a River Network. GeoHazards 2023, 4, 526-553. https://doi.org/10.3390/geohazards4040030

AMA Style

Skahill B, Smith CH, Russell BT. Marginal Distribution Fitting Method for Modelling Flood Extremes on a River Network. GeoHazards. 2023; 4(4):526-553. https://doi.org/10.3390/geohazards4040030

Chicago/Turabian Style

Skahill, Brian, Cole Haden Smith, and Brook T. Russell. 2023. "Marginal Distribution Fitting Method for Modelling Flood Extremes on a River Network" GeoHazards 4, no. 4: 526-553. https://doi.org/10.3390/geohazards4040030

Article Metrics

Back to TopTop