**6. Statistical Analyses**

The prior analysis focused on comparisons between model EC forecasts for the regression and WARMF models and how the differences between model predictions and observations change over time for forecast lead times ranging from Δ Day + 0 to Δ Day + 14. Correspondence between model EC forecasts and EC observations exhibited significant variability. In general, as expected, the differences between model predictions of EC and observations increase with forecast lead time. Consequently, the question arises up to what lead time can the model forecasts of EC be considered reasonably reliable. In this section, a statistical approach to comparing the means of the observations and model EC forecasts is described.

The application of statistical testing methods for comparing the two models requires that careful consideration be given to the underlying assumptions made in the analysis. A preliminary decision is what statistical property should be tested. For the statistical analysis, a comparison of observation and forecast means was selected following the prior analysis based on the fact that mean salinity load, the product of the mean concentration (EC) and the mean flow, is the parameter of primary interest.

In general, most environmental data do not follow a normal distribution, as will be demonstrated for the observed EC monitoring data presented in this study. This fact has important impacts on the statistical tests that can be employed to test the equivalence of the observation and forecast EC mean values. The classical t-test statistic assumes the data are normally distributed. If they are not normally distributed, it might be possible to transform the data (e.g., using logarithmic transformations) so that, when plotted, they appear normally distributed. Such transformations can sometimes complicate the interpretation of the results. Non-parametric methods, that do not assume the data are normally distributed, are tests on the median values of the sampled data and therefore are not appropriate for this study. Another approach is the use of a permutation test. This method employs large numbers of stochastically generated realizations based on the underlying data to obtain a reasonably normal distribution of values. This is the statistical analysis approach chosen for this study.

The application of these methods was accomplished with the use of the R-commander software platform (R version 3.5.3). R is public domain software available under the "Great Truth" Copyright (C) 2019 The R Foundation for Statistical Computing. Additionally employed in the analysis were several R scripts developed by Practical Statistics Inc. and made available through their Applied Environmental Statistics courses. The statistical methods deployed in the analysis that follows were chosen based on their relative accessibility and the perception that these could be easily explained to program participants and interested stakeholders. Given the differences in the ways each of the models has been deployed for forecasting (one run daily and the other weekly), it was thought necessary to address these potential biases through the use of standard, well recognized methods. These included


The results of these analyses are presented for selected lead times of Δ Day + 12 representing the late forecast period. The boxplots showing the results of the Regression (Figure 13a) and WARMF (Figure 13b) model forecast EC comparisons with the observed data EC. Boxplots are visual tools that can be used to indicate whether the data are normally distributed. If the distribution is normal, the boxplot would be divided into equal (blue) areas by the median (black line) and the data range represented by the dashed line would have equal lengths on the top and bottom of the box. As illustrated, these conditions are not met by the EC observations and EC forecasts for either model. The Shapiro–Wilks test is a statistical test used to evaluate whether data are normally distributed. Commonly, a *p*-value of less than 0.05 is considered indicative of a non-normal distribution. As shown in Figure 13, the *p*-values are considerably less than 0.05 confirming the boxplot interpretation. At forecast lead time Δ Day + 12, the boxplots in Figure 13 suggest that neither the observed EC or model forecast EC are normally distributed but have similar variances.

**Figure 13.** (**a**,**b**). Boxplots of observed EC and forecast EC by the Regression (**a**) and WARMF (**b**) models are shown for forecast lead time Δ Day + 12. Fligner–Killeen variance *p* values are 0.6244 and 0.2703 for the Regression and WARMF models, respectively.

Scatterplots of observed EC data and both Regression and WARMF model models EC forecasts are shown in Figure 14a,b, respectively, with their linear regression plots superimposed. The Regression model EC forecasts shows slightly less scatter around the "best fit" regression line than the WARMF model EC forecasts. However, neither model shows a high R-squared coefficient indicating poor fit.

**Figure 14.** (**a**,**b**). Calculated linear regression relationship (solid blue line) for the Regression (**a**) and WARMF (**b**) models together with a scatterplot of the underlying observed EC data and model forecast EC for lead time Δ Day + 12.

Figure 15 shows the histograms and *p*-values associated with the matched pair permutation test for both Regression (15a) and WARMF (15b) model EC forecasts for forecast lead time Δ Day + 12. The results of the matched pair permutation test indicate that neither the Regression model nor WARMF model EC forecasts are good representations of the observed EC values at lead day Δ Day + 12. The Regression model EC has a *p*-value of slightly greater than 0.05 (0.1021) while the WARMF model EC has a *p*-value is slightly less than 0.05 (0.0283).

**Figure 15.** (**a**,**b**). Histograms of the mean differences between observed EC and model forecast EC for the Regression (**a**) and WARMF (**b**) models for model forecast lead time Δ Day + 12.

In addition to the selected lead times presented above, adjusted R-squared and matched pair permutation tests were computed for EC predictions from both Regression and WARMF models for all EC forecast lead times from Δ Day + 0 to Δ Day + 14. Figure 15a,b shows the adjusted R-squared values for both models. As illustrated, the Regression model has higher adjusted R-squared values than the WARMF model throughout the forecast period indicating a better goodness of fit. However, it is also worth noting that the adjusted R-squared values for both models decline progressively over the forecast period indicating a declining goodness of fit at longer lead times.

The results of the matched pair permutation tests comparing the mean of the observed EC and forecast EC for both regression and WARMF models are shown in Figure 16.

**Figure 16.** Adjusted R-squared values for the Regression and WARMF models for all EC forecast lead times.

The results of the statistical analyses are summarized as follows:

