1. Introduction
Environmental models are constructed to test hypotheses and make predictions in support of resource management. Simulation results are typically accompanied by significant uncertainty that must be accounted for during decision making and resource allocation. This inherent uncertainty has many sources, but herein we focus on two widely recognized sources of uncertainty: parameter variability and historical observations of system state.
Parameters are model input quantities, which are specified or predefined during model development. Parameter valuation uncertainty is addressed through the description of parameter value likelihood with probability distributions. Observations are measured quantities. From the environmental model construction perspective, observations provide “target” values used to assess model skill and capability with the goal of producing a model that simulates or predicts values similar to observed values, i.e., history matching, when the model is driven by parameter values that describe the current system state. The inherent underlying assumption promoting environmental model construction is that a model, with demonstrated history matching skill, will provide predictive capability when “new” forcing is applied that represents an unobserved scenario used to guide resource management decision making.
Data assimilation (DA) is a collection of methods and tools for the optimal combination of information from numerical model simulations with observations to obtain the “best” description of a dynamical system and the uncertainty contained within the optimal system description. It inherently accounts for parameter and observation uncertainty. In DA, the numerical model provides a forecast which is adjusted to account for or better represent observations.
DA, as an umbrella categorization, covers a variety of methods and techniques that are derived from Bayes’ theorem [
1]. Contained under the DA umbrella are inverse approaches to environmental model calibration, which seek to vary input parameter values to provide the best fit between simulated and observed values, including “calibration-constrained uncertainty analyses” provided by the PEST suite of utilities [
2,
3,
4].
Bayes’ theorem, see Equation (
1), provides the fundamental starting point shared by all DA methodologies [
1]. Equation (
1) quantifies the model parameter uncertainty, where
k represents the model parameters, observations or targets used in calibration are
h,
() signifies a probability distribution,
is the prior parameter probability distribution,
is the likelihood function, and
is the posterior parameter probability distribution. The posterior parameter probability distribution
is the probability distribution of model parameters conditioned on observations [
3]:
Modern Bayesian inverse techniques for hydrologic model calibration are an example of the use of DA methods in hydrology. The goal of the inversion approach is the selection of parameter values to generate the best fit between simulated values and observations, where the best-fit is quantitatively evaluated using goodness-of-fit metrics. In the inverse problem context, the observed values are “target” values. Assuming that the initial ranges of parameter values in the inverse problem are constrained by professional knowledge, the inverse problem approach to environmental model formulation seeks to adjust parameters in ways that are in harmony with expert judgment and improve history matching. This yields an approximation to the posterior parameter ensemble in a Bayesian sense,
in Equation (
1), and explains why “calibration-constrained uncertainty analysis [
4]” is a form of DA.
In DA implementation, observations,
k in Equation (
1), include an observation error model that contains “measurement error”, or “observation error” that is propagated through the assimilation to the posterior parameter probability distribution,
. “Observation error” always includes instrument error and may include representation error, which accounts for different representations by the measurements and the numerical model [
1]. Representation errors in numerical weather prediction and oceanographic implementations are typically errors due to scales and physical processes that are unresolved by either the numerical model or the observations [
1,
5]. Hereafter, representation error attributed to scale or process incompatibility between the numerical forecast model and observations is labeled “numerical representation error”.
This paper presents a discharge uncertainty envelope designed to provide an observation error model in DA implementations that use river discharge observations calculated via a rating curve from an observed stage. For this type of observation, the rating curve provides an additional source of representation error from scale and process incompatibility between the rating curve hydrodynamic model and “true” discharge. Hereafter, this form of representation error is termed “rating curve representation error”. The goal of the discharge uncertainty envelope is to optimize the bias–variance trade-off impacts to the posterior parameter probability distribution,
in Equation (
1).
Figure 1 graphically explains the bias–variance trade-off and the contribution of the discharge uncertainty envelope to a robust DA implementation.
For implementation, the uncertainty envelope requires a DA framework that provides for the propagation of prior observation uncertainty through the analysis to posterior parameter uncertainty and for the adjustment of constraint on possible posterior parameter values to reflect the prior observation uncertainty. Ensemble methods, which are a type of DA implementation, provide an explicit and natural approach to accommodate a wide range of observation uncertainty representations. Consequently, the discharge uncertainty envelope is specifically designed for use with the iterative ensemble smoother (iES) algorithm [
6,
7] within the PEST++ suite of DA tools, hereafter PESTPP-IES, as the data assimilation engine because PESTPP-IES prior-data conflict capabilities address numerical representation error [
3,
7]. The discharge uncertainty envelope addresses measurement error and rating curve representation error, and the combination of PESTPP-IES and the uncertainty envelope generates a DA implementation with a complete observation error model accounting for measurement, numerical representation, and rating curve representation error.
4. Discussion
In generic DA terms, the observation uncertainty envelope provides an observation error model for discharge observations that accounts for measurement error and rating curve representation error. PESTPP-IES inherently addresses numerical representation error with its prior-data conflict resolution abilities. Combining the discharge observation uncertainty envelope with PESTPP-IES produces a DA observation error model that accounts for measurement error, rating curve representation error, and numerical representation error.
The goal of the uncertainty envelope, and all DA observation error models, is to avoid bias in and minimize the variance of the posterior parameter distribution. PESTPP-IES uses the observation error model to convert targets to a range of acceptable values for each history matching time. If the range is too large, conditioning information on parameter values will be greatly reduced, and large posterior parameter uncertainties will result. This effect will produce excess variance in predictions made with the model. If the target range is too small, then too much accuracy and precision are attributed to the observed values, and the posterior parameter ensemble may be overly narrow and biased as a result of “overfitting”. This bias in estimated parameters ultimately translates to bias in the important predictive outcomes. The alternatives of encouraging overfitting, due to artificially specifying too much target constraint, and larger spread in posterior parameter ranges, due to lack of target constraint, define a bias–variance trade-off.
Two alternative discharge observation error models, (1) no observation error model and (2) the standard deviation of the gauge record, were implicitly examined. The alternative of no observation error model is described in
Figure 1; the concern with no observation error model is that the range of target values for each history matching time,
i, will be too narrow and will promote overfitting and introduce bias into the posterior parameter distribution.
The other alternative discharge observation error model is the “± Standard deviation” envelope shown on
Figure 6 and
Figure 7. This discharge observation error model is the standard deviation of the observed monthly discharge time series added to and subtracted from the monthly averaged observed discharge. In this case, a constant standard deviation value is used for all
i for targets observed at the same gauging station. Because PESTPP-IES (and DA implementations in general) uses an additive Gaussian error model for each history matching time,
i, a constant value across all
i will provide a poorly resolved observation error model for highly variable and dynamic observation sequences, such as those obtained from gauges in the study area and shown in
Figure 3 and
Figure 4.
Examination of
Figure 6 and
Figure 7 shows that “Observation Uncertainty” envelopes, which are the expected uncertainty envelope observation error model, generally have a smaller range than “±Standard deviation” envelopes, which are the standard deviation observation error model, suggesting that the expected uncertainty envelope will produce relatively narrow posterior distributions, denoting a decrease in parameter variance and amelioration of the variance portion of the bias–variance trade-off.
The expected uncertainty envelope created by
produces a different additive Gaussian error model for each history-matching time,
i. This permits the inclusion of different relative errors for low flows (±100%), high flows (±40%) and medium flows (±20%). It also allows the use of non-Gaussian random variates in Equation (
3) to generate
values. River flow, and most other hydrology data sets, are often best described with non-Gaussian distributions, such as extreme value distributions.
In this study, prior-data conflicts, such as those highlighted on
Figure 7, are on the falling limbs of the hydrographs. The study environment is karst terrain characterized by complex and largely unknown enhanced flow pathways in caves, conduits, and regions of enhanced secondary porosity. We hypothesize that the subsurface storm flow will be an important stream flow generation mechanism across the study area and will contribute significantly to the falling limb of discharge hydrographs. This physical process is not represented in the HSPF models used to produce
Figure 6 and
Figure 7. The watershed parameter values that are the focus of the calibration-constrained, model predictive uncertainty analysis that generated these figures will not significantly impact the representation of subsurface storm flow. The exclusion of these prior-data conflicts prevents the inversion process from calibrating bias into the best-fit watershed parameter values for targets, where these parameters should not be directly relevant. This is a specific example of the inherent numerical representation error handling capabilities in PESTPP-IES.
4.1. Limitations of the Expected Discharge Uncertainty Envelope
The expected uncertainty envelope provides an observation error model accounting for measurement error and representation error between the rating-curve discharge model and observations. Consequently, it is only applicable when discharge is (1) an observed value for DA and (2) calculated or estimated rather than measured. These two limitations restrict its use case to hydrologic studies focused on regional and sub-regional water budget representation that employ hydrologic routing (i.e, using hydrologic models which simulate discharge).
Studies that use hydrodynamic models, comprised of partial differential equations solving for spatial and temporal variation in free surface elevation and conservation of momentum, to simulate river flow will not use discharge as an observation for DA. This type of numerical model provides a more robust and accurate calculation of discharge than a rating curve and could use observed water depth from the gauging station as observations for data assimilation. When hydrodynamic models that explicitly simulate spatially variable flow, rather than discharge, are used, other types of observations that involve direct measurement of flow velocity, such as the observations obtained from an acoustic Doppler current profiler (ADCP) are often preferred [
28].
For the study area, the identification of flow regime is an important component of generating the expected uncertainty envelope because low flows occur frequently and provide for the largest relative error expectation (i.e., ±100%). Low flows are also a water resources management focus in this region. However, study areas, or sub-areas, with limited discharge variability, such as
Figure 5, do not require a flow-regime dependent, uncertainty representation. With a relatively constant discharge record, the
values will likely be similar to the standard deviation of the record. In the case of limited dynamic variation in the observation record, a flow-regime dependent observation error model is not needed.
4.2. Goodness-of-Fit Metric Values
NSE and KGE, the two goodness-of-fit metrics used in this paper, have a range of
, for no predictive skill, to 1, for perfect predictive skill. Examination of
Table 3,
Table 4,
Table 5 and
Table 6 suggests that KGE is generally smaller than NSE but that there is not a completely consistent relationship between the two metrics for the study area data sets and implementation. Consequently,
, which combines NSE and KGE, is used as a single goodness-of-fit metric; it has a range of
, for no predictive skill, to
, for perfect predictive skill.
Discharge observations derived from stage measurements using a rating curve are expected to generate significant observation model error components for DA because a rating curve provides an imperfect hydrodynamics model for streams and rivers. Hydrodynamics models such as [
28,
29] simulate spatial and temporal variation in free surface elevation and the conservation and transport of momentum, which are the primary drivers of stream and river flow at frequencies higher than daily. When constant water density is assumed, the transport of momentum can be simplified to the transport of velocity and velocity travels with the fluid, such as a scalar [
29,
30,
31]. Consequently, there is not a robust, physics-based rationale to expect a strong correlation between stage and discharge for frequencies higher than the daily one, and a rating curve provides a limited and imperfect hydrodynamics model at these higher frequencies. These limitations are reflected in the mean
values obtained from the comparison of stream gauge discharge data sets to the stochastically estimated discharge series, see
Table 3 and
Table 4.
For lower-frequency discharge estimation, such as monthly, relative improvement in mean
values in
Table 5 and
Table 6 suggest that a rating curve provides a more robust hydrodynamics model for monthly discharge estimation, relative to daily discharge estimation. The unbiased variate mean
, for monthly series comparison in
Table 5, is always greater than 1.6, which denotes significant skill in estimation of “true” discharge values. Intuitively, it is reasonable to expect that an increase in the average stage across an interval of days to weeks correlates with an increase in the average discharge across the same interval.
PESTPP-IES produces an ensemble of parameters that provide best-fit history matching between simulated values and target values. An ensemble of parameters necessarily generates an ensemble of results. An important analysis decision when using ensemble methods is selecting which ensembles represent equally good reproduction of observed history. This selection is complicated by the fact that target data and parameterization knowledge are limited and there is uncertainty for both target and parameter values.
The NSE, KGE, and
values in
Table 3,
Table 4,
Table 5 and
Table 6 can be employed as thresholds, or criteria, for the selection of equally best-fit ensembles. The evaluation threshold for each gauge and each metric would be the unbiased variate mean value for gauging stations with “good” data quality during WY2021 (see
Table 1) and the biased variate mean value for gauging stations with “fair” or “poor” data quality. These criteria provide several of many criteria, assuming that other target types besides discharge are employed, by which the “final” collection of equally best-fit parameter ensembles would be selected from the PESTPP-IES ensemble results. To illustrate this,
values from
Table 6 are used to select the best-fit ensembles shown in
Figure 6 and
Figure 7.
In conceptualization, these goodness-of-fit thresholds provide an estimate of the information content in the discharge record accounting for the observation error model. If the calibrated or trained environmental model results produce NSE, KGE, or metrics that exceed these thresholds, that is good, but it is possible that the model is being calibrated to overfit these targets, if the calibration enforces parameter selection to achieve history matching beyond the maximum thresholds.
5. Conclusions
A discharge observation uncertainty envelope is presented and developed that provides an observation error model for ensemble methods of DA. It uniquely accounts for the rating curve representation error, related to differences between the rating curve model of discharge and “true” discharge, and measurement error. In this formulation, the discharge observation uncertainty is flow regime dependent and has the largest relative uncertainty for low flows and then for high flows, with relatively reduced uncertainty for “normal” flows.
The goal of the uncertainty envelope is to avoid bias in and minimize the variance of the posterior parameter distribution. It is compared with two other observation error models: no error model and a standard deviation envelope. The discharge observation uncertainty envelope reduces bias relative to the no-error model because it provides for a range of target values for each history matching interval. It reduces variance relative to the standard deviation envelope because it generally provides a narrower range of target values for each history matching interval.
The observation uncertainty envelope is designed specifically for use with PESTPP-IES, which accounts for numerical representation error through prior-data conflict identification. The combination of the discharge observation uncertainty envelope and PESTPP-IES generates a DA observation error model that addresses the measurement error, rating curve representation error, and numerical representation error. Goodness-of-fit metric thresholds, i.e., , , and thresholds, are identified as part of the observation uncertainty envelope development that can be used as selection criteria for the identification of posterior parameter values from PESTPP-IES uncertainty analysis results.