2.3.1. First Step: Triple Collocation

Although a phase-decoupled approach is employed in order to reproduce the qualitative behavior of changes and spatial redistribution in the wave direction, when significant diffraction conditions are detected (e.g., in front of reflecting obstacles like rocky coasts), a Boussinesq-type model is required. However, due to the large extension of the spatial domain and the large wave dataset and computing effort normally required for the computation of diffraction in arbitrary geophysical conditions, a different approach was applied in the present study. Taking into account the way in which MIKE 21 SW solves the equations, better results are normally obtained with waves that are parallel to one of the coordinate axes or (if just one boundary is used to force the model) perpendicular to the offshore boundary from which the input waves are coming [137]. Therefore, the input open boundary should be placed so that the main wave direction is orthogonally aligned. Looking to a large scale, the exchanges between the Gulf of Naples and the Tyrrhenian Sea occur along the Bocca Grande, the main aperture of the gulf between Ischia and Capri. Considering that Bocca Grande opens to the west into the Mediterranean Sea and the bulk of the waves are provided by westerly waves [5], firstly, the open boundary of the numerical domain was orthogonally aligned along the 220◦ N direction. For convenience, we will call this domain M220◦. Under this domain, the numerical model was forced with the hindcast data by ECMWF (grid reference point E3). Since two sources of direct meausurement of the "physical truth" (with certain systematic deviations and random errors) are available (i.e., ADCP-B and DWSD-B datasets), the search for the best set of model parameters (breaking parameter, bottom friction and white capping) was iteratively obtained by firstly applying a triple collocation procedure. This is the singularity of the present study, for which the third dataset is not univocally defined: the triple collocation is not used just for error estimation purposes but to help the search for the best calibration.

The term "triple collocation" indicates a methodology used to characterize systematic biases and random errors in satellite observations, model fields and in situ measurements. It attempts to segregate the measurement uncertainties, spatial and temporal representation and sampling differences in the different datasets by an objective method [137,138]. A frequent and often biased assumption is that all errors are due to the system that is being tested against a reference system, that is in turn assumed perfect. In this vein, Stoffelen [38] also refers to the biases associated with regression and with error distributions. These issues cannot be clearly resolved in dual comparisons, as scatter will be caused simultaneously by all the above issues for both observing systems, and there is no clear objective way to assign errors to one or the other [139]. In triple collocation, instead, three (ideally) independent datasets are brought together, so that three scatter plots can be made. Focusing on the specific application to coastal engineering, and referring to studies provided by Robertson et al. [45], Muraleedharan et al. [42] and McColl et al. [46], the technique needs the assumption of a linear relationship between the measured value and true value. The following equation can be considered:

$$X\_i = \alpha\_i + \beta\_i T + \mathfrak{e}\_i \tag{10}$$

where the *Xi (i* ∈ *{1, 2, 3})* are collocated measurement systems linearly related to the true underlying value *T* with additive random errors *ei*, respectively. The terms α*<sup>i</sup>* and β*<sup>i</sup>* are unknown calibration parameters representing bias and the linear calibration coefficient (i.e., the ordinary least squares intercepts and slopes, respectively).

Given that the true value *T* is unknown, this method requires that one of the datasets is defined as the reference. However, as noted by Janssen et al. [40], the choice of *T* does not affect the results. The ADCP-B dataset was defined as the reference dataset; hence, its α and β will be set to 0 and 1, respectively.

The first step removes α from the datasets by introducing the following new variables:

$$X\_i' = X\_i - \alpha\_i \tag{11}$$

A new set of equations, without *T*, result from inserting (11) into (10). The uncertainty term *e* in (10) can be modified to:

$$X\_{i}^{\prime\prime} - T = \frac{X\_{i}^{\prime}}{\beta\_{i}} - T = \frac{\varepsilon\_{i}}{\beta\_{i}} = \varepsilon\_{i}^{\prime\prime} \tag{12}$$

By calculating the difference between any two of above equations, the true value can be eliminated, obtaining the following:

$$X\_{1}^{\prime\prime} - X\_{2}^{\prime\prime} = e\_{1}^{\prime\prime} - e\_{2}^{\prime\prime}$$

$$X\_3'' - X\_2'' = \mathfrak{e}\_3'' - \mathfrak{e}\_2''$$

$$X\_1'' - X\_3'' = \mathfrak{e}\_1'' - \mathfrak{e}\_3''$$

Assuming the errors from the independent sources have zero mean and are uncorrelated with each other and with *T*, error terms can then be calculated by multiplying any of the two equations above, introducing the mean values:

$$\begin{aligned} \overline{(X\_1'' - \overline{X\_2''})} \left( \overline{X\_3''} - \overline{X\_2''} \right) &= \left( \varepsilon\_1'' - \varepsilon\_2'' \right) \left( \varepsilon\_3' - \varepsilon\_2'' \right) = \left( \varepsilon\_2'' \right)^2 \\\ \overline{(X\_1'' - \overline{X\_3''})} \left( \overline{X\_2''} - \overline{X\_3''} \right) &= \left( \varepsilon\_1'' - \varepsilon\_3'' \right) \left( \varepsilon\_2'' - \varepsilon\_3'' \right) = \left( \varepsilon\_3'' \right)^2 \\\ \overline{(X\_1'' - \overline{X\_3''})} \left( \overline{X\_1''} - \overline{X\_2''} \right) &= \left( \varepsilon\_1'' - \varepsilon\_3' \right) \left( \varepsilon\_1' - \varepsilon\_2'' \right) = \left( \varepsilon\_1' \right)^2 \end{aligned} \tag{14}$$

Then, according to Janssen et al., [40], it is possible calculate the linear calibration coefficient for the *X2* and *X3* datasets (the DWSD-B datasets and the numerical output, respectively, in the present study). It can be calculated as:

$$\beta\_2 = \left( -B\_2 + \frac{\sqrt{B\_2^2 - 4A\_2 C\_2}}{2A\_2} \right) \tag{15}$$

$$\beta\_3 = \left( -B\_3 + \frac{\sqrt{B\_3^2 - 4A\_3C\_3}}{2A\_3} \right) \tag{16}$$

where A2 = *r*2*X* 1*X* <sup>2</sup>, *r*<sup>2</sup> = *e*1/*e*2, *B*<sup>2</sup> = *X* 1 <sup>2</sup> − *r*2*X* 2 2, *C*<sup>2</sup> = *X* 1*X* <sup>2</sup>, and A3 = *r*3*X* 1*X* <sup>3</sup>, *r*<sup>3</sup> = *e*1/*e*3, *B*<sup>3</sup> = *X* 1 <sup>2</sup> − *r*3*X* 3 2, *C*<sup>3</sup> = *X* 1*X* 3.

Hence, the bias is calculated as:

$$
\alpha\_2 = \overline{X\_2} - \beta\_2 \overline{X\_1} \tag{17}
$$

$$
\alpha\_3 = \overline{X\_3} - \beta\_3 \overline{X\_1} \tag{18}
$$

assuming as initial values for the iterative method that α<sup>2</sup> = α<sup>3</sup> = 0 and β<sup>2</sup> = β<sup>3</sup> = 1. The iterative process ends when either of the bias, beta or error variance converge [45]. For this study, convergence was based on the error variance.

Operatively, the triple collocation procedure has been repeated several times with different numerical outputs (*X3*) obtained each time after arbitrary modification of model parameters. It was assumed, in particular, that the calibration of the numerical output was achieved when the error variance estimated by means of the triple collocation between *X3* and *X2* and *X3* and *X1* were both smaller than the error variance between *X1* and *X2*.

For the qualitative evaluation of the comparison results, statistical indicators such as bias and root mean square error (*RMSE*) were used. These parameters are defined as:

$$Bias\{X\_i, X\_j\} = \frac{1}{N} \sum\_{n=1}^{N} \left(\mathbf{x}\_{j,n} - \mathbf{x}\_{i,n}\right) \tag{19}$$

$$RMSE(\mathbf{X}\_i, \mathbf{X}\_j) = \sqrt{\frac{1}{N-1} \sum\_{n=1}^{N} \left(\mathbf{x}\_{j,n} - \mathbf{x}\_{i,n}\right)^2} \tag{20}$$

where *xj* and *xi (i,j* ∈ *{1, 2, 3}, ij)* indicate the wave parameters at the n-th hourly sea state, respectively, measured by the DWSD buoy, the ADCP or provided by numerical runs, and *N* is the total number of hourly sea states considered for the field test campaign. The notation is such that capital letters

represent random variables, and lower-case letters represent realizations of random variables. We also may derive the bivariate correlations between the measurement sources as:

$$\Gamma = \frac{\text{COV} \{ \chi\_{i\prime} \chi\_{j} \}}{\sigma\_{X\_{i}} \sigma\_{X\_{j}}} \tag{21}$$

obtained by defining the second-order quantities that are estimable directly from sample measurements, i.e., the covariance, COV(Xi,Xj), and the standard deviation, σ, of two datasets.

The key benefits of the first step can be summarized as follows:


These benefits, used together under the triple collocation technique, allow for an efficient rough calibration.

#### 2.3.2. Second Step: Ensemble of Multiple Runs

The small gulf of Bagnoli-Coroglio Bay represents a sub-basin in the northwestern end of the Gulf of Naples. Water exchange occurs between Pozzuoli Bay and the Gulf of Naples through a section that is 100 m deep and 2 km wide. Considering its wave sector, in addition to M220◦, the other four different offshore boundary orientations have been applied (M180◦, M240◦; M260◦; M280◦), as shown in Figure 9.

**Figure 9.** Bathymetry implemented for each boundary orientation: (**a**) model M180◦; (**b**) model M220◦; (**c**) model M240◦; (**d**) model M260◦; (**e**) model M280◦.

The main idea was that the model results coming from M220◦ can be improved by weighing the contributions of each sub-model. Hence, equal numbers of wave sectors from the whole offshore dataset have been propagated within the respective oriented model, as explained in Table 8. In other words, the 1-year wave time series was divided into five sub-series, collecting waves by the five directional ranges. The angular width of such wave sectors is not equal but is chosen concurrently, considering that the morphological characteristics of the bay and offshore directional wave rose.


**Table 8.** Geographical information of nearshore study sites.

At their first run, the set of model parameters found for M220◦ was considered. The procedure is synthesized in the scheme reported in Figure 10.

The whole dataset of resulting wave patterns obtained by means of the multi-domain procedure can be seen as not significantly affected by neglecting diffraction (i.e., not significantly different in comparison to adopting a diffraction model using just five scenarios along the mean five directions).

The calibration of each domain has been carried out, comparing the numerical results of a 1-year wave time series at the control point corresponding to the DWSD-A location, focusing only on waves coming from the reference wave sector. Then, the search for the optimal tuning parameters was carried out iteratively.

Concerning this iterative operation, it is of significative importance that an enhancement factor is introduced into the input series. In fact, after a first run for all the five boundary orientations, a sort of large bias was detected. Even if a calibration in phase one was carried out, the gross error remained too high, especially for larger wave heights. This discrepancy was not found when the geographically transposed dataset (at point O) was used to force the numerical model. Therefore, in order to use the ECMWF dataset (with the advantage of 40 years of continuous wave data), a rough correction parameter had to be applied (i.e., the enhancement factor) and then the finer calibration procedure could be applied. As previously highlighted, the use of a reanalysis product (the ERA-Interim dataset) as input for the numerical model leads to a general underestimation of the wave height. Considering that these differences can be attributed mainly to the dissimilar measurement conditions, an estimation of the discrepancies between ECMWF and IWN buoy records (the one at Ponza buoy and at point O, by transposition) is available in Section 2.1.1. In this work, the value of 1.42 (see Table 3) has been applied to amplify the wave height time series at point E3. The amplification factor was not necessary in the first step. This can be attributed to the small range of wave heights collected at MEDA B. As highlighted in Section 2.1.1, no relevant differences between reanalysis and direct measurements were detected for *Hm0* < 0.75 m.

Finally, the resulting five numerical series were assembled in order to reconstruct the 1-year nearshore dataset. The main information for the assemblage was represented by the offshore wave direction provided at point E3. The logic description of the assemblage algorithm is reported in Appendix A.

**Figure 10.** Flow chart of the two-phase calibration procedure.

#### **3. Results**

#### *3.1. DWSD Buoy Compared to the ADCP*

Figure 11a shows the time series of the significant wave height *Hm0* measured by the two wave instruments, presenting very good agreement considering the very different instrumental techniques that were used. It can be noted that the time series plot between the DWSD-B and ADPC-B looks very similar, without any significant deviation, especially when the significant wave height exceeded 0.5 m. In Figure 11b, the comparative analysis of the significant wave height clearly shows a good correlation between the simultaneous buoy data and the ADCP, with a bias of 0.038 m, *RMSE* of 0.07 m and a correlation coefficient *R* of 0.96. Such a strong agreement between two wave sensors of totally different natures confirms the high quality of this cheap DWSD. For practical coastal engineering applications, sea states with values of *Hm0* lower than 0.50 m are in many cases considered as calm conditions. If only the sea states greater than this threshold are considered in the comparison between DWSD and ADCP, then the results show excellent correlation, with a bias of of 0.011 m and *RMSE* of 0.05 m. The wave height correlation coefficient between sources is Γ = 0.87.

**Figure 11.** (**a**) Data series of the significant wave height *Hm0* between DWSD-B buoy and ADCP-B (from 12 May to 18 May 2016); (**b**) comparison of the significant wave height *Hm0* between the two instruments.

The correlation of the peak period Tp between the ADCP-B and the DWSD-B is shown in Figure 12a. The peak period considered in the analysis refers to the peak related to the wind–sea spectra (0.11 Hz < *f* < 0.49 Hz). The results of peak periods show some small differences between the two instruments, mainly when the calculated wave spectra have multiple peaks in the wind sea frequency range (0.11 Hz < *f* < 0.49 Hz), of approximately equal magnitude, leading to some difficulties in the correct evaluation of the *Tp*. Small stochastic effects may easily modify the spectral peak, yielding a slightly different peak period. Figure 12b shows the comparison of the peak period for each sea state, showing a bias of −0.1 sec and an *RMSE* for the 1.1 section.

**Figure 12.** (**a**) Data series of the peak period Tp measured with the DWSD-B buoy and the ADCP-B (from 12 May to 18 May 2016); (**b**) comparison of the peak period Tp between the two instruments.

#### *3.2. Triple Comparison*

After the implicit validation between ADCP and the DWSD buoy was obtained, the M220◦ MIKE 21 model was calibrated. The breaking parameter, bottom friction and white capping were tuned in order to provide better wave predictions. In the present work, the iterative procedure ends when the bias related to Hs between numerical output and DWSD buoy is less than the one between ADCP-B and DWSD-B (i.e., <0.038). In particular, a bias = 0.03 m was reached.

Figure 13 shows a brief time series of the different datasets. The original ECMWF dataset (without the enhancement factor) is depicted. It is worth noting that the available data for triple comparison are few and, in particular, the range of measured wave height is between 0.4 m and about 1 m. This reinforces the need for a second stage of calibration.

**Figure 13.** Observed and modeled time series of hourly averaged wave height. The observed time series of 6-hourly averaged wave height at the ECMWF grid point E3 (offshore of Gulf of Naples) is also reported.

It is also important to note the smoothed signal for the hindcast data and, hence, the numerical model, due to the smaller sampling frequency. By means of the small arrows in Figure 13, we also reported the offshore wave direction at point E3, which ranged between 232◦ N to 274◦ N. It can be noted that for waves coming from 230◦–250◦ N, the numerical model fits very well with the direct measurements (in this vein, it could be seen as the envelope curve for those time series). On the other hand, when the direction became higher than 250◦–255◦ N, the error significantly increased. This evidence corroborates the use of a multi-domain approach in order to overcome the intrinsic limits of the numerical model in relation to diffraction issues.
