1. Introduction
The use of surrogate models (also referenced as emulators or metamodels) and machine learning techniques has gained increasing popularity for coastal hazard assessment applications [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10]. These approaches represent data-driven predictive tools that are calibrated using a database of synthetic storm simulations, with the ultimate goal to establish fast-to-compute emulators that approximate (emulate) the expected storm surge with high accuracy. The output of the emulator is the storm surge (peak values or time-history evolution), and the input is the parametric features that can be used to uniquely describe each storm within the available database, for example, the parameters of the storm wind-field model. The emulator ultimately establishes an approximation of the input/output relationship and, when properly calibrated, can replace the high-fidelity numerical model utilized to develop the original synthetic storm database, offering predictions with similar accuracy and vastly improved computational efficiency. This circumvents any computational challenges associated with using the aforementioned high fidelity numerical models for real-time forecasting and storm surge risk assessment, ultimately allowing the development of efficient and versatile hazard prediction tools for emergency response management and regional planning [
11,
12]. Gaussian process emulators (kriging) have been proven in several studies [
2,
3,
4,
13,
14,
15] to offer great versatility as a surrogate modeling technique in this context, providing highly accurate storm surge predictions over large coastal regions with thousands of nodes and for databases of various sizes (number of storms) and characteristics (underlying features of the synthetic storms).
Given a database of synthetic storms, two essential steps are needed before the emulator calibration. These steps correspond, respectively, to the establishment of the input and output (observations) training data. First, a parameterization of the database is needed to define a vector of storm features that will serve as the emulator input. These features typically pertain to track, size, and intensity characteristics for each storm, and in order to account for the time evolution of these characteristics, different approaches can be established leading to a definition based on: (i) a specific reference instance (landfall or reference landfall for bypassing storm) [
3], (ii) averaged values over some time-window around such a reference instance [
16], or (iii) a description that considers the entire functional dependence over time [
9,
17]. Second, for nearshore or onshore nodes that have remained dry in some of the synthetic storm simulations, the imputation of the database to address the missing data corresponding to such dry instances is warranted. This imputation process provides the pseudo-surge [
18], which then replaces the missing data in the original database. Though both steps are important, the emphasis of this paper is on the second step, the database imputation.
Geospatial interpolation techniques, such as kriging [
14] or
k nearest-neighbor (
kNN) interpolation [
16] have been proven as appropriate for performing this imputation, leveraging the spatial correlation between nodes in order to infer missing data based on the surge values of other inundated nodes in close proximity. The selection of the exact type of geospatial interpolation depends on the number of nodes for which surge data is available, whether they correspond to the original numerical grid, which may include hundreds of thousands of nodes, or to some selected, few (up to a couple of thousand) save points. In the former case, a weighted
kNN interpolation [
16] has been shown to provide good accuracy and computational efficiency, whereas for the latter case, advanced geospatial interpolation techniques, such as kriging, can be explored [
14] since the smaller size of the database (smaller number of nodes) does not pose insurmountable computational challenges for such techniques. Independently of the approach adopted to perform the imputation, the established pseudo-surge estimates may provide an erroneous classification for some storms, with the node classified as inundated based on the pseudo-surge (pseudo-surge value larger than node elevation), even though it is known, based on the original database, that the node is actually dry. Some adjustment of the imputed pseudo-surge is required for such instances, since the direct use of the erroneous pseudo-surge training data will ultimately lead to a poor emulator performance, over-predicting surge values for such cases. On the other hand, the type of this adjustment is also important since it can generate gaps in the input/output data, as demonstrated later, impacting the emulator calibration process and its predictive capabilities.
This paper examines this specific problem of appropriately adjusting the imputed pseudo-surge values for the accurate, emulator-based prediction of peak storm surge. The integration of a secondary node classification surrogate model was recently proposed [
16] to address the challenges associated with the erroneous pseudo-surge values generated at the imputation stage. The proposed secondary surrogate model couples logistic principal component analysis (LPCA) [
19] with a kriging emulator on the resultant natural parameters of the logistic process to establish an efficient classifier, even for applications with a large number of nodes. For any node that has been misclassified at least once during the database imputation, termed as a problematic node, the final surge predictions are established by combining the predictions of the secondary classification surrogate model and the primary storm surge surrogate model, which is established using the imputed database. It was recommended in [
16] to adopt the predictions of the secondary classification surrogate when the primary surrogate provides the condition of the problematic node to be as inundated (surge estimate larger than the node elevation), in order to counteract the propensity of the latter to overestimate the surge. This contribution considers a number of critical advances for integrating such a node classification setting in storm surge surrogate modeling. The accuracy established from this integration is examined for groups of nodes with different characteristics, instead of averaging over the entire geographical domain, providing an in-depth analysis of the benefits that such an integration can offer and of the cases for which such benefits are critical for coastal hazard assessment. A detailed comparison to an alternative implementation, considering the direct adjustment of any erroneous pseudo-surge imputations, is also considered. Furthermore, potential advantages of combining the two different surrogates (node classification and surge prediction) are discussed across all nodes, without constraining only to the problematic ones. Finally, the probabilistic characterization of the node classification (condition), instead of a deterministic one, is examined for the surrogate model combination. This probabilistic characterization for the secondary surrogate model is facilitated directly using the underlying logistic regression, while for the primary surrogate model, it is developed utilizing the uncertainty quantification offered by the corresponding Gaussian process emulator. All these advances are illustrated using a 645 synthetic tropical cyclones (TCs) database developed for a flood study in the Louisiana region. Various flood protective measures are present in the region, creating interesting scenarios with respect to the group of nodes that remain dry behind them for many of the database storms. Some adjustments in the
kNN interpolation used for the geospatial imputation are also discussed, considering the node grid connectivity within the hydrodynamic surge simulation model when deciding on the selection of the neighboring grid point locations, rather than basing any decisions solely on the closest distance. The development of different surge surrogate models for parts of the database with different surge behavior is also incentivized and investigated through some of the examined comparisons.
The remaining of the paper is organized as follows:
Section 2 establishes the notation formalism used in this manuscript, with
Section 3 presenting an overview of the database used in this study.
Section 4 reviews the
kNN imputation process and motivates the need to consider the node classification surrogate model.
Section 5 reviews the fundamentals of the primary and secondary surrogate model developments, while
Section 6 discusses details of the combination of the two surrogate models, distinguishing between different approaches based on the node characteristics at the database imputation stage. Finally,
Section 7 considers a detailed presentation of results from the application of the combined surrogate model implementation to the Louisiana case-study database.
2. Notation Formalism
A database of
n synthetic storms is available that provides surge predictions for a total of
nodes within the computational domain. For the surrogate model development, each of the synthetic storms is parameterized through the
-dimensional vector
, which will serve as the emulator input, with
denoting the
ith input component. Further details on the selection of
x for the case study database will be provided in
Section 3. The input vector for the
hth storm will be denoted as
. Let
denote the peak surge for the
ith node and the
hth storm. Notation
will also be used for the surge of the
ith node when the explicit dependence on the storm input vector needs to be highlighted. Let
denote the
-dimensional surge vector, with its components corresponding to the surge values for all individual nodes of interest. All notations, including the subscript, superscript, and input dependencies, extend to the vector notation
z as well. For example,
corresponds to the vector of peak surge for all nodes for the
hth storm, which is described through the input vector
. The classification of the
ith node condition for the
hth storm is denoted by
or
when the dependence on the storm input needs to be explicitly noted, with the convention that
corresponds to the node being inundated (wet) and
to the node being dry. The dry instances
correspond to missing data in the original database, with no predictions for the corresponding storm surge
. The nodes for which
is equal to 1 across all storms correspond to nodes that have been inundated across the entire database and will be referenced as “always wet” nodes, whereas the inland nodes for which
is equal to 0 for at least one storm will be referenced as “once dry” nodes. The latter node group has at least one missing value (the surge is not provided for at least one storm) across the suite of storms. The number of at least once dry nodes will be denoted as
. Additionally, let the elevation of the
ith node be
, and for each node define the surge gap as:
where
denotes the minimum of the quantity inside the parentheses across all the storms in the database. The surge gap will be used later on to distinguish the dry nodes into different groups.
Finally, assembling the data across all storms, let X, Z, and denote the matrices for the storm input, surge, and node classification, respectively, whose rows correspond to the characteristics for individual storms. Across the manuscript, lower case variables denote characteristics for specific storms, and upper case variables refer to characteristics across the entire database. Matrix X has dimension , with rows corresponding to ; Z has dimension , with rows corresponding to ; and has dimension , with the element (hth row and ith column) corresponding to . The instances/elements in matrix that correspond to value 0 represent the missing data in the original Z matrix. The database for the peak storm surge ultimately provides the parametric input matrix X and the peak surge matrix Z. The classification matrix is derived based on Z, with 0 representing instances of missing surge values (node is dry) and 1 the rest (node is inundated, so surge prediction is available).
3. Louisiana Database Overview
The database used in this study is part of the U.S. Army Corps of Engineers’ (USACE) Coastal Hazards System (CHS;
https://chs.erdc.dren.mil (accessed on 15 April 2022)) [
12]. The CHS Louisiana Coastal Study (CHS-LA) was conducted for quantifying storm hazards and coastal compound flooding in Louisiana, including areas in the vicinity of the Greater New Orleans Hurricane Storm Damage Risk Reduction System (HSDRRS). The storm suite developed for CHS-LA consists of
n = 645 synthetic tropical cyclones (TCs), separated into eight main tracks, referenced herein as master tracks (MTs), as shown in
Figure 1 and
Figure 2. The grouping of the different tracks is based on the storm heading direction at the final approach before landfall.
Figure 1 shows all unique storm tracks, separated into the MTs based on color, whereas
Figure 2 presents separately the tracks within each MT, additionally establishing a magnification closer to landfall. All storms are characterized by unique combinations of the following parameters: landfall location, defined by the latitude,
, and longitude of the storm track,
; heading direction during final approach to landfall,
β; central pressure deficit,
; translational speed,
; and radius of maximum wind speed,
. The heading direction (
β) dictates the MT (as shown in each of the subplots in
Figure 2), and the combination of latitude
and longitude
the specific track within each MT. The remaining parameters dictate the strength
, size
, and speed
characteristics of each synthetic storm, further distinguishing storms that might correspond to the same track. These characteristics have been held relatively constant prior to landfall, as shown in
Figure 3, illustrating the variation of these storm parameters for a typical storm of the database.
Table 1 summarizes the range of the TC parameters that constitute the database. The reported values for each parameter correspond to the ones of peak storm intensity in each case.
The simulation of the 645 synthetic TCs in the CHS-LA database was performed using high-resolution, high-fidelity atmospheric and hydrodynamic numerical models. The parameters of the synthetic TC suite were first used as input to drive a Planetary Boundary Layer (PBL) model on a nested grid to generate the wind and pressure fields used as forcing in the hydrodynamic modeling. The hydrodynamic simulations were performed by coupling the ADCIRC (Advanced Circulation) model [
20] and the SWAN (Simulating Waves Nearshore) wave model [
21]. The ADCIRC mesh grid, corresponding to the v14a grid including the 2023 update of the coastal protection systems for the greater Louisiana region, consists of close to 1.6 million nodes and 3.1 million triangular elements. A subset of the entire domain will be considered for the metamodel development, focusing on areas around New Orleans, constrained by latitude (28.5°, 40°) N and longitude (86°, 93.5°) W. This corresponds to a total of
1,179,179 nodes, with
488,216 being dry in at least one storm.
Figure 4 presents the dry/wet information for nodes and storms using histograms of (a) the percentage of storms that each node is inundated for, and of (b) the percentage of nodes that are inundated for each storm. Both histograms are presented as relative frequency plots, with the total number of elements per bin divided by the total number of elements, which is
nz for part (a) and
n for part (b). Note that for part (a) of
Figure 4, percentage equal to 1 corresponds to the always wet set.
The geographic domain includes various flood protection systems around the Greater New Orleans area. Within the ADCIRC numerical model, some of these systems were modeled using disconnected grids (no elements connecting nodes) with appropriate constraints across nodes to couple the water elevation.
Related to the storm parameterization, the characteristics at landfall are chosen to define
x, taking into account the aforementioned small variability of the size, intensity, and speed along the synthetic storm track (demonstrated in detail in
Figure 3). This leads to an input vector that includes the latitude,
, and longitude,
, for reference landfall; the heading direction for the storm MT,
β; the central pressure deficit,
; the radius of maximum winds,
; and the translational speed,
:
.
A simplified, piece-wise linear coastal boundary is utilized for defining the reference landfall, also shown in
Figure 1 (white solid line). The boundary simplification is chosen based on the recommendations in [
14] to avoid any ambiguous definition of landfall due to the existence of bays. The input parameters
and
are defined when the synthetic storm track crosses this boundary.
6. Combination of Surrogate Models for Storm Surge and Node Classification
This section considers the formal integration of the two surrogate models,
and
, presented individually in
Section 5. This integration focuses on applications that adopt the
pseudo-surge database as observations for the
surrogate and intends to correct the erroneous information retained in this database for the node condition across the set of problematic nodes. Though components of the integration may also be considered for applications that use the
corrected pseudo-surge database as observations for the
surrogate, the intention of providing a priori adjustments for the imputed surge is to strictly rely on the
surrogate, since no erroneous information has been retained in the database.
Considering the integrated metamodel formulation, predictions for the storm surge are established by the surrogate model through a combination of Equations (6) and (8). The classification of the node condition can be established by combining the deterministic ( of Equation (9) for and of Equation (15) for ) or the probabilistic ( of Equation (10) for and of Equation (14) for ) predictions from the and surrogate models. The combination, which will be examined in this section, ultimately provides the node classification of the integrated metamodel formulation, denoted herein as . This finally adjusts the surge predictions: if =1 then the predicted surge equal to , else the node is predicted as dry.
For the combination of the two surrogates, the following classes of nodes can be distinguished:
Nodes, denoted as class, that were inundated for the entire database (always wet). As discussed earlier, for these nodes, only predictions from the surrogate are available, so the node condition classification is based entirely on that metamodel.
The problematic nodes , denoted as class, that were misclassified for at least one storm during the database imputation.
Remaining nodes, denoted as class, that were dry for at least one storm in the database and the database imputation did not contribute to any misclassifications.
A further adjustment can be established in the definition of groups
and
based on the values of
and
. Any nodes that correspond to small values for both
and
, for which the quality of the available information should be considered as high (even though some erroneous information, expressed through misclassifications, still exists), can be moved from group
to
. In the case study that will be discussed later, the thresholds used for
and
are 5 cm and 2.0%, respectively. It should be noted that group
is expected to include nodes in areas of complex geomorphology, for example behind or close to protective structures, for which the database imputation is expected to face challenges, as discussed in
Section 3. The distinction of the nodes among the different groups is, though, purely based on the misclassification statistics from the database imputation, without the need to incorporate any additional considerations about the location of the nodes. This is motivated by the fact—as was also stressed earlier—that the objective of the database imputation and the classifier integration is the improvement of the data-driven surge predictions.
For establishing rules regarding the combination of the surrogate model predictions, the following two characteristics should be considered. First, binary classification problems, like the one addressed by surrogate model
, are in general more challenging metamodeling applications [
22]. For this reason, the predictive accuracy of surrogate model
is expected to be higher, at least when the database that is used for its calibration does not include any erroneous information. Second, for class
, predictions from
will tend to be misclassified as false positives (nodes that are dry will be characterized as wet), since, as discussed in
Section 4.2 and also clearly illustrated in
Figure 3, the
pseudo-surge database is biased that way. If
predicts nodes as dry, then predictions could be trusted, since such predictions are opposite to the potential metamodel bias. If, on the other hand, a node is predicted as wet, then due to the propensity of
for false positive misclassifications, little credibility should be given to those predictions.
Considering these two characteristics, the following recommendations were given in [
16] for establishing
based on the expectation that
will benefit from higher surrogate model accuracy, rely primarily on this metamodel, and utilize
only as a safeguard against the false-positive misclassification propensity in class
. This means that for class
, predictions are established utilizing
only, while for
, predictions of the
are preferred only when
predicts the node as inundated. This leads to the following
:
For all instances that
, the surge estimates are provided directly by the
predictions. The definition of Equation (23) guarantees that for all such instances where
, the surge estimate indeed corresponds to the node being inundated. For instances that
the node is classified as dry.
This integrated implementation described by Equation (23) uses the secondary node classification surrogate only for the problematic nodes (class
). Here we revisit the above integration to examine the potential benefits on the predicted surge across all nodes. This requires that we relax the higher trustworthiness given to the
metamodel predictions and treat the predictions from both surrogates
and
as having a similar degree of credibility. Additionally, it requires one to use the probabilistic predictions associated with each surrogate model instead of the deterministic ones. If the binary classifications
and
are to be combined, then the requirement to provide a binary classification for
leads to prioritizing one node condition, for example predict
if either
or
, which creates a bias towards this condition. Instead, the probabilistic predictions are utilized to establish the probability of the
ith node being wet by the combined model, denoted by
herein, and the final classification
is established based on this
value. A weighted average approach is adopted to establish the
predictions using
and
, leading to:
where
is defined as the weight given to the
surge surrogate model, and
the weight given to the
classification metamodel. For both classes
and
, the classification is established using this
information, leading to:
Of course, since the weights
are node-dependent, the representation of Equation (25) is very versatile.
The selection of these weights,
, is made to reflect the degree of confidence for each model. For example, for the case examined in Equation (23), where
(
is given substantially higher confidence) apart from nodes in class
for which
where
(
is given no confidence due to the known propensity to overestimate the surge for this class). This combination will be denoted as “
prioritization”. Another extreme case would be to always trust the
predictions, leading to always assigning
. This combination will be denoted as “
prioritization”. On the other hand, a balanced implementation would weigh both the
and
predictions for all instances that these predictions could be deemed as reliable, for example, using equal weights
. Since, as discussed earlier, for the nodes in class
for which
the
predictions cannot be regarded as reliable,
should be adopted for such instances. Though some arguments can be made that for values of
close to 0.5 (so the node is classified as wet with a small margin) some degree of trustworthiness exists even for the
predictions, defining appropriate thresholds to quantify what close to 0.5 means in this instance is tricky. On the other hand, if
, then both models can be combined to provide
, even for class
. This implementation will be denoted as “
balanced combination” and leads to:
It should be noted that for the
surrogate, we have that
.
For the prioritization and the balanced combination, some further adjustment is needed for providing the final surge prediction. Even though, as discussed earlier, in the prioritization for all instances corresponding to the surge estimates can be provided directly by the metamodel (since it is guaranteed that ), the same does not hold across the other two variants. Certain instances with might be associated with predictions , i.e., the probability given by the surge surrogate is , but finally through the contribution of in Equation (24). This means that a node is classified as wet by the higher confidence of the classification surrogate, but the surge surrogate, which is supposed to offer the value of that surge, has an estimate that is below the elevation, indicating on its part that the node is dry. Thus for those points, although their condition has been determined probabilistically as wet, their respective surge estimate provided by the metamodel is below the node elevation. For this reason, the following modification is established. For instances corresponding to , the surge estimate is taken equal to the predictions, , if , else it is set equal to some margin (taken as 2.0 cm in this study) over the node elevation . This adjustment guarantees that the node is classified indeed as inundated based on the assigned surge predictions for all instances .
One can further extend these concepts to utilize a strictly probabilistic classification, in other words, use as the final predictions instead of converting them to the binary classification , but such implementation falls out of the scope of this study. The intention is to provide surge predictions for new storms, which in turn requires a deterministic classification for each storm.
The validation of the combined surrogate model implementation directly follows the guidelines provided in
Section 5.3, with the only requirement being to replace in all instances
with
. This validation is examined next within the case study to assess the appropriateness of the different techniques introduced here to facilitate the database imputation and the storm surge predictions.
7. Case Study Implementation
This section presents results for the case study with emphasis on the impact of the integration of the
and
surrogate models. This is accomplished by comparing the implementations of the
pseudo-surge database and the
corrected pseudo-surge database, examining both formulations for defining
(as discussed in
Section 6), and presenting results for different groups of nodes, separately for classes
,
, and
, as well as for nodes with different surge gaps. Two different settings are considered for the classes
and
; the first one uses all the problematic nodes, and the second one moves nodes corresponding to
< 2% and
< 5 cm, from group
to
. These alternative class definitions are denoted as
and
. The number of such nodes is 109,701.
Table 2 presents the number of nodes corresponding to the different group definitions along with the respective percentages of instances these nodes are inundated within the original database. It is evident from this table that nodes that correspond to larger surge gaps (even as large as 0.25 m) are predominantly dry within the original database.
Unless specified otherwise, all results (including the numbers presented in
Table 2) refer to the
kNN implementation that incorporates the ADCIRC connectivity. Across all implementations, the weights for the
balanced combination (Equation (26)) are chosen as
.
Initially, some results are presented separately for each of the two surrogate models (surge and classification) across all the nodes, briefly examining the selection of the number of principal components and the impact of overfitting before the emphasis is shifted to the integration of the two surrogate models.
Two different validation implementations are considered: (i) LOOCV without repeating the PCA (for
) or LPCA (for
) and the hyper-parameter calibration, and (ii)
k-fold cross-validation (
Section 5.3). These will be denoted as LOOCV and
k-fold CV, respectively. Ten (10) different folds were used for the
k-fold validation implementation. It should be stressed that, as discussed in
Section 5.3,
k-fold corresponds to the proper cross-validation implementation, with the PCA (or LPCA) and the hyper-parameter calibration repeated after the removal of each set of storms. As such, it will be considered as the reference in all comparisons. Since
k-fold has, though, a substantial computational burden, LOOCV is explored as a more efficient alternative.
7.1. Selection of Principal Components and Examining Overfitting Challenges
A parametric investigation is initially performed to examine the impact of the
(for
) and
(for
) values on the metamodel accuracy. For both metamodels, the formulation incorporating the residual discussed in
Section 5.1 and 5.2 is presented, denoted as “Residual” in the plots. Additionally, both LOOCV and
k-fold CV results are presented to establish thorough comparisons between them.
Figure 9 shows results for the
metamodel looking at the average misclassification
for an increasing number of latent components.
Figure 10 presents the results for the
metamodel for the use of the
pseudo-surge database, presenting both the average correlation coefficient,
, and the surge score,
, for an increasing number of latent components.
Results indicate that as the number of latent components
(for
) and
(for
) increase, the accuracy of both metamodels improves, as expected. The incorporation of the residuals improves the accuracy when a small number of principal components is used, but for a sufficiently large number, it offers no benefits. As explained in detail in [
22], the incorporation of this residual substantially increases the computational complexity, especially the memory requirements for accommodating the metamodel predictions. Trends, therefore, indicate that the use of a larger number of principal components without the incorporation of the residual in the metamodel formulation is the preferred implementation. For the
metamodel, a number of principal components close to
= 12−15 offers the greater accuracy, while for the
metamodel, a constant improvement is observed till an accuracy plateau is reached. This behavior is similar to the one reported in study [
22]. For reference, if the implementation of [
25] was used to address strictly the LPCA overfitting, the optimal number of principal components,
, would have been 35. As stressed earlier,
needs to be adjusted appropriately while considering the coupling with the surrogate model error through the proposed parametric investigation, which in this case yields a significantly lower number of components in the order of 12–15.
Considering the difference between the LOOCV and
k-fold CV, and although LOOCV seems to over predict, offering higher estimates for the metamodel accuracy (more on this later), the decisions related to the optimal number of principal components would be practically identical using either of the two approaches. This means that overfitting effects related to the cross-validation implementation are substantially smaller than those observed in [
22], in which LOOCV implementation contributed to some erroneous decisions. These trends indicate that the larger database size greatly helps mitigate any adverse overfitting effects. Note that the reduced accuracy estimated by the
k-fold CV is an expected trend, associated with the fact that for some of the folds, many of the storms that are removed will belong to the parametric boundaries of the database X, forcing extrapolations for the metamodeling validation increasing this way the prediction errors, and should not be necessarily attributed as an accuracy over prediction by the LOOCV.
Finally, the comparison for the
metamodel between the cases with and without the node grid connectivity in the
kNN imputation indicates different trends with respect to the two compared metrics. Using the node connectivity provides better accuracy for
, but a lower accuracy for
. This should be attributed to the differences mentioned in
Section 5.3.2 between these metrics;
assesses accuracy with respect to the imputed surge database, and
with respect to the original database. Using the grid connectivity at the imputation stage evidently provides pseudo-surge estimates with smaller smoothness, contributing to lower
values, but accommodates also smaller misclassifications, as reported in
Section 4.2, leading to smaller
values. Since, as discussed in
Section 5.3.2,
is a more appropriate measure to assess metamodel accuracy, the results in
Figure 10 indicate that for the
pseudo-surge database, the use of node connectivity in the
kNN implementation accommodates, ultimately, a higher metamodel accuracy.
7.2. Integration of the Two Surrogate Models
For the remaining results, the number of principal components utilized are
for
and
for
, while all validation statistics presented correspond to the
k-fold CV implementation. Results are presented for the different groups of nodes identified earlier in
Table 2, and for different metamodel variants. The following variants are examined:
- (a)
Using the pseudo-surge database and relying strictly on the surge metamodel predictions. This is denoted as implementation in the results.
- (b)
Using the pseudo-surge database, but considering the combination of the classification and surge metamodels, either using prioritization (abbreviated as SP implementation in the results), prioritization (abbreviated as CP implementation in the results), or the balanced combination (abbreviated as CB implementation in the results), using the original definition for classes and . For the CB, an implementation without the surge transformation (function g(.)) will also be considered, denoted as CBNoTr implementation in the results.
- (c)
Using the pseudo-surge database and considering the balanced combination of the classification and surge metamodels for the alternative definition for classes and . This will be denoted as .
- (d)
Using the corrected pseudo-surge database. In this case, the surge metamodel is strictly used, since it is developed based on a correct node condition database.
All variants are presented looking at both the incorporation or not of the grid connectivity for the
kNN imputation.
Table 3 presents the averaged (across different groups of nodes) surge score values, while
Table 4 the averaged misclassification values.
Table 5 and
Table 6 present, respectively, the averaged false positive and negative misclassification values for the different node classes. Note that in all these tables, class
corresponds to the always wet nodes, whereas the once dry group corresponds to the complement of
, also representing the union of classes
and
(or
and
).
Figure 11, finally, presents the surge score and the misclassification for the once dry nodes as a function of the surge gap for some of the variant implementations of interest. It is important to note that when comparing across the different groups of nodes, the differences in surge score/misclassification percentages and all the associated trends are small when examining results for the always wet nodes or groups that involve a large portion of nodes that are predominantly inundated in the original database. For this reason, emphasis in all the comparisons will be placed on groups/classes of nodes with characteristics that create challenges in the metamodel development (problematic nodes or nodes with large gaps). Note also that for certain comparisons, for example, when examining the different variants for the same pseudo-surge database, the results are identical for some of the groups. This is true for the group of always wet nodes (
).
Results present some interesting and quite complex trends. To better identify these trends, we initially restrict the comparisons to the same type of imputation process, looking separately into the different variants that adopt databases with or without node connectivity in the
kNN implementation (left or right columns in each table, or same color lines in
Figure 11). Comparing first the variants that rely strictly on the surge metamodel, i.e., the
implementation for the
pseudo-surge database or the implementation using the
corrected pseudo-surge database, it is evident that use of the
pseudo-surge database leads to substantial worse accuracy across all node groups, with larger surge scores (
Table 3) and larger misclassification percentages (
Table 4). The lower performance stems from over predicting the surge values, as evident by the false-positive rates shown in
Table 5. In contrast, this performance deteriorates for dry nodes that are problematic (classes
or
) or for the nodes that correspond to larger surge gaps, as evident from the results in all tables. These trends are anticipated since, as discussed earlier, the
pseudo-surge database includes erroneous information for these specific groups of nodes, something that substantially impacts the quality of the metamodel that is calibrated based on this information. The use of the
corrected pseudo-surge database improves all these vulnerabilities.
A more remarkable improvement is established, though, when the integration of the classification and the surge metamodels is considered. All variants using the
pseudo-surge database that additionally adopt the metamodel combination improve upon the variant utilizing the
corrected pseudo-surge database across both the surge score (
Table 3) and the misclassification (
Table 4), with the greatest improvements stemming from the reduction of the false positive misclassifications (
Table 5). The improvement is more significant for the problematic nodes (class
or
), especially the ones corresponding to large surge gaps, as evident in both
Table 3 and
Table 4 as well as in
Figure 11. These observations showcase that the benefits from the integration of the classifier in the overall metamodel implementation will be larger for nodes with problematic behavior that are predominantly dry in the original surge simulations (check the information provided in
Table 2) and belong in regions with complex geomorphologies, as indicated by the large surge gap values. As the prediction of coastal hazard for such nodes could be of greater practical interest, these trends clearly demonstrate the substantial benefits that the use of the classifier can provide, stressing the importance of considering its integration for providing the surge predictions. Comparing across the two different validation metrics, greater differences are observed for the misclassification percentage compared to the surge score. This trend actually holds for most of the comparisons that will be examined in this section, and it is influenced by the fact that the surge score also considers contributions from instances that the node is correctly classified as inundated (predicted surge compared to actual surge), with these contributions being substantial in many instances, though surge score is the most important validation metric, demonstrating how well the actual surge is explained. However, the misclassification is also of high importance, as it relates to the ability of the metamodel to identify the dry/wet boundary. As such, the larger discrepancies that exist in some of the comparisons for the misclassification accuracy should be considered as important trends.
Comparing across the different variants that consider the metamodel combination, it is evident that from the alternative implementations introduced in this paper, the
prioritization or the
balanced combination outperform the original
prioritization introduced in study [
22]. These trends indicate that the classification surrogate model provides high-quality estimates, with
prioritization outperforming
prioritization, illustrating that the concerns about the reliability of the
metamodel might not always hold (they will be database and application dependent). It is important to note that the misclassification percentages in
Table 4,
Table 5 and
Table 6 for the
prioritization and
prioritization implementations for class
facilitate a direct comparison between these two variants, as the corresponding classification predictions have been established using only this variant. These comparisons clearly illustrate the fact that for this specific case study, the
prioritization outperforms the
prioritization. Overall, the best variant is actually the
balanced combination adopting the alternative definition regarding the problematic nodes (using classes
and
). This demonstrates that the greatest robustness in the classification predictions is obtained by the probabilistic combination of the two metamodels and the careful definition of the problematic nodes for which the
metamodel predictions are assumed to over predict the surge. It should be noted that the differences between the variants that consider the metamodel combination are even smaller for the surge score predictions. Beyond the reasons identified in the previous paragraph, the smaller differences stem from the fact that the
prioritization and
balanced combination variants rely on the artificial adjustment of
metamodel surge predictions for some instances, as discussed in
Section 6, with the choice to set the predictions to 2 cm above the node elevation might not being the optimal one. Finally, the consideration of the surge transformation g(.) provides substantial benefits across most comparisons, with the exception of surge score for nodes with large surge gaps (
Table 3). Even for those exceptions, the performance of the different variants is practically identical, demonstrating a clear preference for utilizing the surge transformation. Additionally, no challenges are associated with the use of the transformation for estimating the classification probabilities according to Equation (10), as indicated by the better classification performance in
Table 4,
Table 5 and
Table 6.
Moving now to the comparison of the two different imputation strategies, the consideration of the node connectivity leads to better misclassification performance (
Table 4,
Table 5 and
Table 6 and bottom row of
Figure 11), but not necessarily to a better surge score (
Table 3 and top row of
Figure 11). Even though for the
metamodel using the
pseudo-surge database considering the node connectivity at the imputation stage always provides better surge score performance, when the classifier is integrated into the formulation, or for the
corrected pseudo-surge database, the consideration of the node connectivity reduces the surge score accuracy for the always wet nodes or the nodes that correspond to smaller surge gaps. For nodes that correspond to larger surge gaps, clear advantages are identified from using the node connectivity. Specifically, the discrepancies for the always wet nodes should be attributed to some overall smoothness reduction across the imputed database when the node connectivity is incorporated, ultimately impacting the quality of the predictions across all nodes (since the metamodel formulation is established simultaneously across the entire database). This agrees with the trends of the correlation coefficient identified earlier in
Figure 10. This discussion indicates that perhaps it is better to consider the development of separate
metamodels across the different classes of nodes. Though one can envision different separation of the database for this purpose, the one investigated here is the distinction to the following two groups: the problematic ones and the rest.
7.3. Considering Separate Surge Metamodels for Different Classes of Nodes
The developments of two separate metamodels to obtain the predictions is examined in this section, distinguishing the nodes based on the quality of information available for them: the class of problematic nodes, , and the remaining nodes, composed of classes and , referenced herein as “trustworthy nodes”. For the problematic nodes, , the previous established predictions are utilized, while a separate metamodel is developed for the trustworthy nodes. For this portion of the database (trustworthy nodes), the pseudo-surge values are expected to have a higher degree of smoothness independent of the approach taken at the kNN imputation stage (regarding the use or not of the connectivity information), which should establish higher accuracy surge predictions. This is the motivation for considering a separate surrogate model for them, developed without including the problematic nodes. It should be stressed that the latter nodes are expected to belong to domains with complex geomorphologies with little correlation to the remaining nodes, as clearly indicated by the large values of the maximum surge misinformation. Therefore, combining their information in the development of a single metamodel, , for the surge predictions across all nodes has the potential to lower the overall accuracy, since this metamodel is forced to fit portions of the database with highly dissimilar behavior.
To accommodate the development of the two independent surge metamodel predictions (one for the problematic nodes and one for the
trustworthy nodes), the formulation of the surge surrogate model (
Section 5.1) is repeated twice, for each of the respective databases, and the overall predictions for the storm surge metamodel,
, are ultimately obtained by combining them. All other steps regarding the combination of
with the
metamodel and the validation remain the same.
Table 7,
Table 8,
Table 9 and
Table 10 present the validation results in identical format to
Table 3,
Table 4,
Table 5 and
Table 6. The difference is that in these tables, the
predictions are established by using different surge metamodels for the different classes of nodes.
Results indicate that the consideration of a separate surge metamodel for the
trustworthy nodes improves overall the quality of the predictions for them with respect to both the surge score (compare
Table 3 to
Table 7) and the misclassification percentage (compare
Table 4,
Table 5 and
Table 6 to
Table 8,
Table 9 and
Table 10) validation metrics. This is especially evident in the comparisons for nodes belonging in classes
and
or
. Note that for class
, results remain the same, since the surge metamodel for these nodes is the one that was used previously. For this reason, the results for the groups of nodes corresponding to large surge gap values also remain unchanged, since a significant portion of these nodes in those groups belongs to class
. The remaining trends, already identified in
Section 7.2 with respect to the benefits of the integration of the surge classifier and the superiority of the
combination approach, are the same, experiencing simply an overall improvement in the accuracy due to the higher quality of surge predictions for the
trustworthy nodes. The overall predictions for the imputed database established without the node connectivity are still better than the predictions for the imputed database with the node connectivity, though the differences in this case are smaller, and the improvement for the
trustworthy nodes is greater when separate surge metamodels are considered. The better overall performance for the imputed database without the node connectivity stems primarily from the nodes in class
.
The above discussions show the advantages of separating the portions of the database with different surge behavior (in this case between problematic and trustworthy nodes) and considering separate surge surrogate models for each of them. Though this objective could perhaps be accomplished through the identification of principal components within PCA, the linear character of PCA evidently prohibits it from establishing a complete separation of the portions of the database with substantially dissimilar behavior, since some (linear) correlation between these portions evidently exists. Only if these portions were completely uncorrelated would PCA be able to accommodate the desired separation. Unless a different dimensionality reduction technique is adopted, the formal separation of the database into two different groups is the only mean for achieving the desired objective.