*Article* **STORAGE (STOchastic RAinfall GEnerator): A User-Friendly Software for Generating Long and High-Resolution Rainfall Time Series**

**Davide Luciano De Luca 1,\* and Andrea Petroselli <sup>2</sup>**


**Abstract:** The MS Excel file with VBA (Visual Basic for Application) macros named STORAGE (STOchastic RAinfall GEnerator) is introduced herein. STORAGE is a temporal stochastic simulator aiming at generating long and high-resolution rainfall time series, and it is based on the implementation of a Neymann–Scott Rectangular Pulse (NSRP) model. STORAGE is characterized by two innovative aspects. First, its calibration (i.e., the parametric estimation, on the basis of available sample data, in order to better reproduce some rainfall features of interest) is carried out by using data series (annual maxima rainfall, annual and monthly cumulative rainfall, annual number of wet days) which are usually longer than observed high-resolution series (that are mainly adopted in literature for the calibration of other stochastic simulators but are usually very short or absent for many rain gauges). Second, the seasonality is modelled using series of goniometric functions. This approach makes STORAGE strongly parsimonious with respect to the use of monthly or seasonal sets for parameters. Applications for the rain gauge network in the Calabria region (southern Italy) are presented and discussed herein. The results show a good reproduction of the rainfall features which are mainly considered for usual hydrological purposes.

**Keywords:** rainfall generator; stochastic processes; STORAGE; VBA macros in Excel

#### **Software Information**


#### **1. Introduction**

Many hydrological applications, mainly related to small and ungauged catchments that are characterized by a short response time of runoff to rainfall, require the use of continuous rainfall time series at high resolutions [1]. However, these data series usually present a very short sample size or they are absent for many sites of interest, where only

**Citation:** De Luca, D.L.; Petroselli, A. STORAGE (STOchastic RAinfall GEnerator): A User-Friendly Software for Generating Long and High-Resolution Rainfall Time Series. *Hydrology* **2021**, *8*, 76. https://doi.org /10.3390/hydrology8020076

Academic Editors: Kwok-Wing Chau

Received: 8 April 2021 Accepted: 29 April 2021 Published: 3 May 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Annual Maximum Rainfall (AMR) series are available (but they are often not so long at the finest time scale, e.g., 1 or 5 min, [2]). In this context, the use of Stochastic Rainfall Generators (SRGs) appears helpful for a more in-depth analysis of rainfall processes [3,4]. SRGs generally present a simple mathematical formulation and low computational costs, and large ensembles of long rainfall time series can be quickly obtained [5]. Moreover, an SRG can be easily used for obtaining perturbed time series [6,7] that are representative of future rainfall on hydrological scales, which are finer than the spatial and time scale investigated by Regional Climate Models (RCM). In fact, concerning this latter aspect, RCM outputs are mainly available at daily scale and are averaged over large spatial resolutions, so they require statistical downscaling or bias correction methods [8,9] for hydrological analyses. Only very recent RCM applications regarded high resolutions (hourly) and small spatial scales (e.g., [10,11]).

Specifically, the Poisson cluster models are the SRGs widely used in literature [12–33]; they include the Neyman–Scott and Bartlett–Lewis families, which provide similar performances [34]. These models can satisfactorily recreate the observed summary statistics (used for calibration) within the generated rainfall series at several fine time scales, but they usually underestimate extreme value distributions on hourly and sub-hourly scales (e.g., [35]). Thus, many variants were proposed, aimed at overcoming this problem. Unfortunately, they implied:


Moreover, other kinds of SRGs, based on different stochastic engines than Poisson cluster models, were also proposed in literature [45,46].

Recent works [6,7,47] investigated the possibility to calibrate an SRG (i.e., to carry out the parametric estimation, on the basis of available sample data, in order to better reproduce some features of interest) by using only sample series at coarser time scales, which are usually longer than continuous data with a high resolution. In this framework, a modified version of the Neymann–Scott Rectangular Pulse (NSRP) model was implemented with Visual Basic for Application (VBA) macros in MS Excel, and the realized software, named STORAGE, is discussed in the present work. STORAGE is the acronym of STOchastic RAinfall GEnerator and its innovative aspects, with respect to other SRGs proposed in literature, can be summarized as follows:


Concerning the latter aspect, the proposed approach is very flexible, because it is possible to model seasonality:


Obviously, this methodology can be applied for any SRG proposed in literature.

The present manuscript is organized as described in the following. A brief overview of the investigated study area, i.e., the rain gauge network of the Calabria region in southern Italy, is presented in Section 2. The theoretical background of the STORAGE model and the user-friendly interface are described in Section 3. Applications are then discussed in Section 4, and the conclusions are drawn in Section 5.

#### **2. Study Area**

The investigated study area is the rain gauge network of the Calabria region (southern Italy). The employed data were downloaded from the website of the Multi Risks Centre of the Calabria region [48]. In particular, authors selected as reference the rain gauges with at least 30 years of observed data concerning AMR series with rainfall durations from 1 to 24 h. In total, time series of AMR, annual and monthly cumulative rainfall values, and annual number of wet days were analyzed for 64 stations (Figure 1). It is noteworthy that in Italy a day is classified as wet if the daily rainfall is greater than or equal to 1 mm.

**Figure 1.** Location of the investigated rain gauges (yellow and red dots) in the Calabria region (southern Italy). The stations characterized by red dots (Montalto Uffugo, Reggio Calabria and Vibo Valentia) are described in detail in the present manuscript (see Section 4).

The Calabria region is located in the central part of the Mediterranean area and the total area is about 15,000 km2; the territory is hilly in 49.2% and mountainous in 41.8% of the total area. From the collected data, the mean annual precipitation (MAP) assumes an average value of about 1150 mm, with higher values in mountainous areas and lower values in the coastal areas (particularly on the north-eastern one). As explained in [49], many rainfall events are induced by cyclones that develop close to the Alps and in the western part of the Mediterranean, and impact on the Tyrrhenian side, moving from west to east. Cyclones from North Africa and the Balkans are less frequent and mainly affect the region eastern side. In general, in the western part of Calabria there are the greatest rainfall amounts, while in the eastern part the most extreme events occur, as they are exposed to more intense cyclones [50].

#### **3. Methods**

#### *3.1. Theoretical Overview of the Implemented Model*

The basic version of the NSRP model [13,51] is suitable for stationary (i.e., without any seasonality and trend) continuous rainfall processes. In such model, five quantities, which are considered as random variables, hence following assigned probability distributions, play a crucial role. In detail, the five quantities are (see also Figure 2):

**Figure 2.** Representation of the Neyman–Scott Rectangular Pulses (NSRP) stochastic process for at-site rainfall modeling. In the upper part of the Figure, 2 storm occurrences (red dots) with an inter-arrival *ts*, 2 bursts for the first storm and 1 burst for the second storm, are represented. The corresponding waiting times, intensities and duration are also indicated. Then, in the lower part of the Figure, the total precipitation intensity at time *t* can be calculated as the sum of all the intensities from the active bursts at time *t*.

• the inter-arrival time, *Ts*, between the origins of two consecutive storms, which is assumed to be an exponential random variable. Consequently, the probability *P*[*Ts* ≤ *ts*] to have a new storm origin after a waiting time *Ts* ≤ *ts* from the previous one can be calculated as:

$$P\_{T\_S}(t\_S) = P[T\_\ $ \le t\_\$ ] = 1 - e^{-\lambda \cdot t\_S} \tag{1}$$

where 1/*λ* represents the mean value for the inter-arrival times, i.e., *E*[*Ts*] = 1/*λ*;


$$P\_W(w) = P[W \le w] = 1 - e^{-\beta \mu \cdot w} \tag{2}$$

with *E*[*W*] = 1/*βW*;

• the intensity *I* and the duration *D* of a specific burst, having a rectangular shape, belonging to a storm. Both *I* and *D* are assumed as exponentially distributed, with

parameters *β<sup>I</sup>* and *βD*, respectively, and mean values *E*[*I*] = 1/*βI*, *E*[*D*] = 1/*βD*, so that:

$$P\_I(i) = P[I \le i] = 1 - e^{-\beta\_I \cdot i} \tag{3}$$

$$P\_D(d) = P[D \le d] = 1 - e^{-\beta\_D \cdot d} \tag{4}$$

By considering all these five mentioned quantities, the total precipitation intensity *Y*(*t*) at time *t* is then calculated as the sum of all the intensities from the active bursts at time *t* (see also Figure 2), and the rainfall height *R*(*τ*) *<sup>j</sup>* , aggregated on the temporal *τ* resolution and related to the time interval *j* with extremes (*j* − 1)*τ* and *jτ*, is:

$$R\_j^{(\tau)} = \int\_{(j-1)\cdot\tau}^{j\cdot\tau} Y(t) \cdot dt \tag{5}$$

An SRG model such as NSRP is usually calibrated by minimizing an Objective Function (OF), which is defined as the sum of residuals (normalized or not) concerning the considered (by user) statistical properties of the observed data at chosen time resolutions and their theoretical expressions. The statistical properties are typically referred to highresolution continuous time series (e.g., 1 or 5-min rainfall time series): mean, variance, and k-lag autocorrelation for *R*(*τ*) *<sup>j</sup>* , at several values of *τ* can be mentioned as examples.

A first crucial aspect of the NSRP model is represented by the seasonality modelling of the rainfall process, for which monthly or seasonal parameter sets are usually considered, i.e., by carrying out a specific calibration for each considered month or season. This procedure clearly implies an increase in the number of the parameters to be estimated, and then a reduced ratio data/parameters.

In this context, another important aspect emerges, i.e., continuous high-resolution data sets are typically very short (in general no more than 15–20 years) or absent in many locations, and then a calibration with these data sets could not be suitable for a robust estimation of parameters.

To overcome these critical issues, a modified version of NSRP was implemented in STORAGE software, which is discussed in this work. STORAGE represents the implementation of the framework presented in [6,7], and its innovation regards the following features:

	- by using goniometric series only for some rainfall descriptors, and by considering the other ones as invariant during the year;
	- by setting the maximum number of harmonics for each selected descriptors, with the goal of having a parsimonious model.

Obviously, like for other SRGs proposed in literature, a transient version can be implemented [6,7] in order to obtain perturbed synthetic series, which are representative of future hypothesized rainfall scenarios on spatial and temporal hydrological scales. However, in this work we describe only the implementation in STORAGE software of the cycle-stationary process (i.e., without temporal trends).

3.1.1. Seasonality Modelling with Goniometric Series

Focusing on the five NSRP summary statistics:


The adoption of different sets for each month would imply the estimation of 60 parameters. Alternatively, it is possible to use goniometric series for the seasonal variation of an investigated quantity *p*:

$$p(t) = p\_0 + \sum\_{n=1}^{K} A\_n \cdot \cos\left(\frac{2\pi \cdot n}{T\_y} \cdot t + \phi\_n\right) \tag{6}$$

where *p*(*t*) is the summary statistic along the time *t* (expressed in min); *p*<sup>0</sup> is the mean value of *p*(*t*) in the whole year; *K* is the maximum number of goniometric functions (also named as harmonics) to be considered; *n* is the *n*-th harmonic function; *Ty* is total number of minutes in the whole year (here considered with 365 days); *An* corresponds to the amplitude for the *n*-th harmonic function; *φ<sup>n</sup>* corresponds to the phase shift for the *n*-th harmonic function.

Adoption of Equation (6) implies the estimation of1+2*K* parameters for each summary statistic, i.e., the annual mean value and the *K* couples regarding amplitude and phase shift for the harmonics.

Under the assumption that the seasonal variation regards all the five summary statistics, the proposed SRG is characterized by: 15 parameters if *K* = 1 for all, 25 parameters if *K* = 2 for all, 35 parameters if *K* = 3 for all and so on. Obviously, *K* can be also different from a summary statistic to the other.

For the selected case study (described in Section 2), STORAGE software was organized with the following assumptions:

(a) The quantities 1/*λ*,*θ*, 1/*β<sup>I</sup>* and 1/*β<sup>D</sup>* present a seasonal variation. Specifically, *K* = 2 is used for 1/*λ* (according to [52]):

$$\frac{1}{\lambda(t)} = \frac{1}{\lambda\_0} + A\_{1,\lambda} \cdot \cos\left(\frac{2\pi}{T\_{\text{y}}} \cdot t + \phi\_{1,\lambda}\right) + A\_{2,\lambda} \cdot \cos\left(\frac{4\pi}{T\_{\text{y}}} \cdot t + \phi\_{2,\lambda}\right) \tag{7}$$

where <sup>1</sup> *<sup>λ</sup>*<sup>0</sup> represents the mean annual value without any seasonal variation; *A*1,*<sup>λ</sup>* = <sup>1</sup> *<sup>λ</sup>*<sup>0</sup> − 1 *λ* min, and <sup>1</sup> *λ* min is equal to the smallest value for mean inter-arrival times between two consecutive storms; *A*2,*<sup>λ</sup>* = *ξ* · *A*1,*λ*; *φ*1,*<sup>λ</sup>* and *φ*2,*<sup>λ</sup>* are the phase shifts for the two adopted harmonics.

(b) As regards *θ*, 1/*β<sup>I</sup>* and 1/*βD*, we adopted *K* = 1:

$$\theta(t) = \theta\_0 + A\_{1,\theta} \cdot \cos\left(\frac{2\,\pi}{T\_y} \cdot t + \phi\_{1,\theta}\right) \tag{8}$$

$$\frac{1}{\beta\_I(t)} = \frac{1}{\beta\_{I,0}} + A\_{1,\mathfrak{E}\_I} \cdot \cos\left(\frac{2\pi}{T\_{\mathcal{Y}}} \cdot t + \phi\_{1,\mathfrak{E}\_I}\right) \tag{9}$$

$$\frac{1}{\beta\_D(t)} = \frac{1}{\beta\_{D,0}} + A\_{1, \mathfrak{E}\_D} \cdot \cos\left(\frac{2\pi}{T\_y} \cdot t + \phi\_{1, \mathfrak{E}\_D}\right) \tag{10}$$

where


These assumptions are compatible with the climatology of the Calabria region. In this part of Italy, the summer period is characterized by a lower average number of rain events with respect to the winter season. Moreover, the summer season usually presents rain events with higher intensities and shorter durations, compared with winter months, due to convective phenomena [53]. No seasonal variation (i.e., *K* = 0) is assumed for 1/*βW*.

Overall, calibration requires the estimation of twelve parameters: 1/*λ*0, (1/*λ*)min, *ξ*, *φ*1,*λ*, *φ*2,*λ*, *θ*0, *θ*min, 1/*βW*, 1/*βI*,0, 1/*βD*,0, *χ* e *η*.

Obviously, as also reported in Section 4, future developments of STORAGE will allow to consider a more comprehensive ensemble of combinations of *K* for the involved parameters, together with more flexibility about the phase shifts here fixed, in order to adequately model rainfall series in other climatic areas around the world.

#### 3.1.2. Calibration

An a priori ensemble of simulations, described below, was carried out and the results were filed into an "information reservoir" in STORAGE software, ready to be queried for a specific site of interest. In detail, all the previously mentioned twelve parameters were considered uniform random variables with assigned ranges of variation, reported in Table 1 [7,54]. Then, 50,000 parametric sets were generated with the Monte Carlo technique and, for each one, a simulation of a 200-year rainfall series with resolution of 1 min was carried out by using the same macros which were afterwards implemented in STORAGE. At the end, we filed in STORAGE software only the parametric sets for which the 200-year synthetic series presented summary statistics according to the variation ranges of those from the observed data of a wide area of interest (i.e., all the rain gauges of the Calabria region for the presented application). Specifically, we focused on the following summary statistics:


The results of this composite filter, constituted by a subset of 50,000 parametric sets, are illustrated in Section 4 for the Calabria region. The storage of this information further justifies the choice of the acronym STORAGE. In fact, the software allows to use, for the synthetic generation related to a single rain gauge of interest, parametric sets belonging to this pre-existing "information reservoir" (regarding a wide previously investigated area), for which the corresponding series of AMR, annual rainfall, seasonal rainfall and number of wet days are comparable with those related to the sample historical data. Obviously, this aspect considerably reduces the calculation times for the model calibration on a specific site of interest, with respect to a usual calibration procedure that is carried out without any a priori indication about possible model outcomes. It is clear that this "information reservoir" can be continuously updated when other areas are investigated as case studies. Moreover, refinement algorithms will be implemented in future versions of STORAGE, in order to enhance the performance of calibration for a specific rain gauge.


**Table 1.** Ranges of variation for parameters in STORAGE, according to [7,54].

#### *3.2. The User-Friendly Interface of STORAGE*

When a user executes STORAGE, after having enabled the VBA macros, the **Main** worksheet will appear as in Figure 3. Two different procedures are allowed for the generation of a synthetic rainfall time series, and each one is associated with a specific command button:

• **RUN with parameter values chosen by the user;**

• **PARAMETER ESTIMATION AND RUN.**

Moreover, in the Main worksheet, the user can manually stop the elaborations in progress with the related command button (**manual STOP to elaborations**).

**Figure 3.** Interface of the Main worksheet of STORAGE after enabling the VBA macros' content.

In addition to the **Main** worksheet, STORAGE contains the following worksheets:


Concerning the **Annual Maxima** worksheet, the series from 60 min to 24 h are estimated by considering the continuous series with a time step of 1 h. This choice is justified by the fact that many observed AMR series around the world were extracted, until 20–30 years ago, by using 1-h continuous data, while data with resolutions lower than 1 h are available only from 1990 or later [55]. Consequently, the comparison among synthetic and observed AMR series should be preferred by using this setting.

#### 3.2.1. Data Input

For both previously mentioned procedures of time series generation, it is necessary to insert the following input information before starting the elaborations:


If the option **RUN with parameter values chosen by the user** is selected, then the user has to fill all the cells from C10 to C22 (Figure 3).

On the contrary, if **PARAMETER ESTIMATION AND RUN** is chosen, then the user has to insert the following input data, which are sample estimates from the observed series of the investigated case study:

• The values of parameters for Amount–Duration–Frequency (ADF) curves, expressed as a power function:

$$h\_T(d) = a\_T d^{vr} \tag{11}$$

where *d* is the rainfall duration (hours) ranging from 1 to 24 h, *T* is the return period (years), *hT*(*d*) is the d-AMR associated with *T*, and *aT* and *nT* are ADF parameters. In detail, the values for *aT* and *nT*, associated with specific *T* values, are requested:


An example of Data Input is shown in Figure 4, if the option **PARAMETER ESTIMA-TION AND RUN** is selected by the user.

#### 3.2.2. Synthetic Generation of Rainfall Time Series at a High Resolution

After completing the Data Input step, it is possible to run one of the two generation procedures. In the following pages, attention is focused on the **PARAMETRIC ESTI-MATION AND RUN** button (Figure 4), which further allows for different generation

alternatives. The table and graphic outputs, associated to **RUN with parameter values chosen by the user** button (Section 3.3), are similar.

**Figure 4.** Example of Data Input step in the **Main** worksheet.

It must be highlighted that, in a worksheet hidden for the user, the results deriving from the use of about 3500 parametric sets, in terms of *aT* and *nT* for the ADF, MAP and the mean annual number of wet days, and mean cumulative seasonal rainfalls (DJF, MAM, JJA, SON), are stored. In detail (see also Sections 3.1.2 and 4), for each single parametric set, 200 years of precipitation were synthetically generated.

By clicking on the **PARAMETRIC ESTIMATION AND RUN** button, the userform shown in Figure 5 is displayed; from the combobox at the top (Figure 6) it is possible to select the statistical descriptors to be reproduced, i.e.,:


After the choice of the descriptors to be reproduced (for example, *aT*, *nT*, MAP, NumWetDays, DJF, MAM, JJA, and SON, as in Figure 6), it is possible to click on the **PARAMETRIC ESTIMATION** button. The software will display by default, in the cell range C10:C22, the parametric set (indicated with ID SET 1) which is characterized, among the 3500 used offline, by the best value (i.e., the lowest value) of the evaluated Objective Function (OF) (Figure 7), in percentage terms, as:

$$\begin{cases} \text{OF } a\\_n \\ \end{cases} \tag{10.7}$$

$$OF = \begin{cases} \stackrel{\text{CNOT\\_W}}{\text{OF}\ \text{MAP\\_NumWet}} \stackrel{\text{CNOT\\_I}}{\text{OF}}\\ \stackrel{\text{CIF\\_N}}{\text{OF}} \stackrel{\text{CNOT\\_I}}{\text{OF}} \stackrel{\text{CNOT\\_I}}{\text{OF}} \stackrel{\text{CNOT\\_I}}{\text{OF}} \end{cases} \qquad \begin{cases} \stackrel{\text{CNOT\\_I}}{\text{OF}}\\ \stackrel{\text{CNOT\\_I}}{\text{OF}} \stackrel{\text{CNOT\\_I}}{\text{OF}} \stackrel{\text{CNOT\\_I}}{\text{OF}} \end{cases} \tag{12}$$

$$\begin{array}{c c c} \hline \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \hline \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \hline \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \hline \end{array}$$

$$
\begin{cases}
\text{OF } u\_{\text{-}n} + \text{OF } \text{NumW } \\_ {\text{NumW } \text{EJwys}} \\
\text{OF } u\_{\text{-}n} + \text{OF } MAP \\_ {\text{NumW}} \text{det} \text{Days} + \text{OF } \text{Sesaus} \\
\text{0} \end{cases}
$$

in which:

$$\bullet \text{ OF } a\_{-}n = \sum\_{i=1}^{K\_a} \frac{|a\_i - a\_i^\*|}{a\_i} + \sum\_{j=1}^{K\_n} \frac{|n\_j - n\_j^\*|}{n\_j} \tag{13}$$

where:


$$\begin{array}{c} \text{OF}\,\,\text{MAP\\_NumWet}\,\text{Days} =\\ \frac{|\text{MAP} - \text{MAP\*}|}{\text{MAP}} + \frac{|\text{NumWetDays} - \text{NumWetDays}^\*|}{\text{NumWetDays}} \end{array} \tag{14}$$

where *MAP* and *NumWetDays* are the sample values which are inserted by the user, while *MAP*∗ e *NumWetDays*∗ are the correspondent NSRP values.

$$\begin{array}{c} \text{OF Seasons} = \\ \cline{2-4} \frac{[DIF-DIF^\*]}{DIF} + \frac{[MAM-MAM^\*]}{MM} + \frac{[IIA-IIA^\*]}{IIA} + \frac{[SON-SON^\*]}{SON} \end{array} \tag{15}$$

where *DJF*, *MAM*, *JJA* and *SON* are the sample values which are inserted by the user, while *DJF*∗, *MAM*∗, *JJA*∗ and *SON*∗ are the correspondent NSRP values.

Whatever option is selected in the combobox, STORAGE will provide the correspondent values for all the OFs (Equations (13)–(15)) for a specific parameter set.

Moreover, by using the spin button (Figure 8), it is possible to adopt other parameter sets for simulation, which are sorted (by STORAGE in the hidden worksheet) on the basis of the values related to the selected OF.

After the choice for parametric set, the user can click on the **RUN** button for carrying out the generation of a synthetic rainfall series.

During the run, the user can control the progress of generation by analyzing the several worksheets in STORAGE.xlsm. As examples, the cell D5 in **Main** (Figure 9) and the histogram for Annual Rainfall (Figure 10) can be checked.

A message box will appear when simulation is completed. Then, the final results can be analyzed in the several tables and plots of STORAGE (Figure 11), while the whole synthetic rainfall series at the selected high resolution (cell D4 in Main), will be printed in "C:\NSRP\RainSim.txt".

As explained in the following sections, STORAGE also allows for rainfall generation with multisets approaches, as an alternative way to the run with a single parametric set.

**Figure 5.** Userform where a user can carry out calibration and select several run options.



**Figure 6.** Example of procedure for calibration, by using the combobox at the top of the userform.

**Figure 7.** Example of calibration results in the Main worksheet.

**Figure 8.** Example of calibration by using the spin button, which allows for using other parameter sets for simulation.

**Figure 9.** During the run, the user can control the progress of generation by checking the number of simulated years in the cell D5 in the **Main** worksheet.

**Figure 10.** During the run, the user can control the progress of generation by analyzing the histogram for Annual Rainfall.

**Figure 11.** Examples of visualization in the different worksheets when the simulations are completed.

#### 3.2.3. Multisets Approaches

Focusing on Option 4 of Equation (12), different parametric sets can be characterized by very similar OF values among them, but some sets could better reconstruct ADF curves, while other ones could best fit MAP and NumWetDays, and so on.

In this context, if the fourth option of Equation (12) (i.e., *aT*, *nT*, MAP, NumWetDays, DJF, MAM, JJA, SON) is chosen as an ensemble of statistical descriptors to be reproduced, the user can take advantages from several parametric sets by selecting one of these two options concerning multisets approaches (Figure 12):


The proposed multisets approaches are based on the concept of equifinality [56], which means that "different parametric sets within a chosen model structure may be behavioural or acceptable in reproducing the observed behaviour of that system".

Ranking from Total OF

It is possible to select *S* parametric sets (sorted with increasing values of Option 4 in Equation (12)) by using the spin button of Figure 12.

**Figure 12.** Example of multisets approach regarding the option "Ranking from total OF".

Automatically, STORAGE will assign (to a specific set) a frequency of use which is inversely proportional to its overall OF value (Option 4 in Equation (12)). In detail, let *fi* be the frequency of use for the *i*-th parametric set (*i* = 1, ... , S) and *OFi* its corresponding value of OF; *fi* is computed as:

$$f\_i = \frac{\frac{1}{\overline{OF}\_i}}{\sum\_{i=1}^{S} \frac{1}{\overline{OF}\_i}} \tag{16}$$

with, obviously, *<sup>S</sup>* ∑ *i*=1 *fi* = 1.

Then, considering the total number *N* of years to simulate (input data in cell D3 in the Main worksheet, Figure 4), the number *fi* · *N* of years will be generated with the *i*-th parametric set.

It should be highlighted that:

• if a multisets approach is selected, a user should consider at most *S* = 4 and a large value for *N* (we suggest *N* = 500 years), in order to have a significant number of years for each set (with N = 500 years and S = 4, there are on average 125 years which are simulated with each set);

• in a context, such as in this case, of stationary/cycle-stationary process (i.e., without any climatic trend), it is not necessary to generate a large number *L* of *N*-year synthetic series (in which each *i*-th set should regard *fi* · *L* series), but it is sufficient to consider the generation of only one year, which is repeated *L=N* times. This is allowed by the ergodicity property of a stationary process [57], which means that the statistics from a long temporal *N*-year series are equal to the statistics from one year (generated *N* times).

After clicking on the **RUN** command button (Figure 12), the user is able to check the progress of the rainfall generation, similarly to the procedure with only one parameter set (Figures 9–11). It is clear that this approach can be well used for a more comprehensive sensitivity analysis (i.e., not only related for the first ranked parametric sets) in further upgraded versions of STORAGE software.

#### Merging Different OFs

This approach can be carried out in two options:


In the first case, from the worksheet (hidden for the user) where the information of the offline generations with about 3500 parametric sets is stored, the VBA code selects the three parametric sets with the lowest values for, respectively, Equations (13)–(15). Then, STORAGE will assign to each selected set a frequency *fi*, evaluated by considering Option 4 of Equation (12) as *OFi* in Equation (16).

In the second case (4 OFs), the parametric set with the lowest value of the overall OF (option 4 of Equation (12)) is also considered, together with the three above mentioned sets.

It must be highlighted that these two options are allowed by STORAGE only if all the 3 OFs of the first option are inside the first 10 positions of the ranking for OF calculated with Option 4 of Equation (12).

Also in this case, after clicking on the 3 OFs or 4 OFs buttons (Figure 13), the user is able to check the progress of the rainfall generation, similarly to the procedures with only one parameter set (Figures 9–11).


**Figure 13.** Use of multisets approach regarding the option "Merging different OFs".

#### *3.3. RUN with Parameter Values Chosen by the User*

This option allows for manually setting the values for the parameters in the interval C10:C21 of cells in the Main worksheet (Figure 4). Also in this case, after clicking on the corresponding command button for the run, the user is able to check the progress of the rainfall generation, similar to the previous described procedures. The ranges of variation for parameters are reported in Table 1, according to [7,54].

#### **4. Application for Rain Gauge Network of the Calabria Region and Discussion**

As regards the application for the Calabria region, we saved in STORAGE about 3500 parametric sets, for which the 200-year synthetic series presented summary statistics ranging inside specific intervals (according to the observed data in the whole region). In detail:


By applying this composite filter, graphical comparisons among synthetic and observed summary statistics are shown in Figures 14 and 15. From analysis of these dispersion plots, the STORAGE good reconstruction for the investigated rainfall descriptors can be assessed.

**Figure 14.** Comparison among synthetic and observed summary statistics.

**Figure 15.** Comparison among synthetic and observed summary statistics.

For the sake of brevity, focusing on specific rain gauge data series, examples of STORAGE application are below described for Montalto Uffugo, Reggio Calabria and Vibo Valentia stations. Their associated sample values for the statistical descriptors are reported in Tables 2 and 3.


**Table 2.** Montalto Uffugo, Reggio Calabria and Vibo Valentia rain gauges: values of parameters concerning ADF curves.


**Table 3.** Montalto Uffugo, Reggio Calabria and Vibo Valentia rain gauges: sample values of mean annual and seasonal precipitation, and mean annual number of wet days.

> For all the three stations, 500-year synthetic rainfall time series with a resolution of 5 min were generated, and we carried out model validation by analyzing the reproduction of frequency distributions for sample data of AMR, annual and seasonal rainfall, and annual number of wet days. The best STORAGE performances were obtained:


For Montanto Uffugo rain gauge, STORAGE provided a 500-year synthetic rainfall time series which satisfactorily reproduces the frequency distributions of AMR sample data (see the EV1 probabilistic plots in Figure 16), with an over-estimation only for 24-h AMR series. The reproduction of the frequency distributions concerning sample series for annual rainfall, annual number of wet days, and seasonal precipitation in DJF, MAM and SON is analyzed on Gaussian plots (Figure 17): a slight underestimation is obtained only for JJA rainfall. As regards Reggio Calabria and Vibo Valentia rain gauges, the obtained results (Figures 18–21) highlighted some crucial aspects to be investigated further when future developments in STORAGE software will be carried out. In detail:


Starting from this latter aspect, a more in-depth investigation of the maximum number of harmonics for some quantities, and of their phase shifts, could justify the STORAGE application also in regions far from the investigated area, i.e., characterized by drier or wetter climates. This obviously means to increase the number of parametric sets to be stored in the software.

Further analyses of STORAGE performances were carried out focusing on Montalto Uffugo rain gauge, characterized by 30-year continuous time series at resolutions of 20 min. Such analyses aim to evaluate the model capacity for reproducing summary statistics of high-resolution continuous series (not used for STORAGE calibration) and to compare the STORAGE results with those from a standard NSRP (i.e., calibrated by only using continuous high-resolution data). In details:

• we calibrated a basic version of NSRP with the 1-h continuous data series (aggregated from the available 20-min one), by estimating parameters for each month (according to [14]) in order to avoid possible underestimation of extremes (as mentioned in the introduction). This version of NSRP is indicated as NSRP\_v0 in the following;

	- mean, standard deviation and percentage of dry intervals from the continuous series at 20-min and 1-h resolutions;
	- mean values of monthly rainfall heights;
	- rainfall heights of ADF curves for return periods T = 5, 50 and 200 years.

Concerning the summary statistics of the continuous series, it is clear that NSRP\_v0 provides the best performances for 1-h resolution, because this time step was used for NSRP calibration in this case. However, the obtained STORAGE results for 1-h data series can be considered acceptable for the mean and percentage of dry intervals (Table 4 and Figure 22). For a 20-min resolution, STORAGE and NSRP\_v0 performances are comparable (Table 4 and Figure 23).

**Table 4.** Montalto Uffugo rain gauge: evaluation of STORAGE and NSRP\_v0 performances. RMSE values related to the mean, standard deviation and percentage of dry intervals for the continuous 20-min and 1-h series.


Moreover, monthly rainfall heights are very well reproduced by STORAGE, as seasonal rainfalls are used for its calibration (Sections 3.2.1 and 3.2.2), but NSRP\_v0 results can be also considered good: RMSE values are 7.5 and 14.4 mm for STORAGE and NSRP\_v0, respectively (see Table 5 and Figure 24).

**Table 5.** Montalto Uffugo rain gauge: evaluation of STORAGE and NSRP\_v0 performances. RMSE values related to the mean of monthly rainfall heights and ADF curves.


The clear benefit of using STORAGE is highlighted by focusing on ADF curves (Table 5 and Figure 25). As expected, STORAGE provides a very good reconstruction (RMSE values are comprised between 5.5 and 6 mm) because it is calibrated with *aT* and *nT* of the sample ADF curves (Sections 3.2.1 and 3.2.2). On the contrary, NSRP\_v0 significantly overestimates rainfall extremes in this specific case; as its parametric estimation is only based on summary statistics from high-resolution continuous series, an acceptable reproduction of ADF curves could not be guaranteed in general (such as for Montalto Uffugo rain gauge), also by using monthly or seasonal parameter sets.

This last comparison allows us to remark the most important aspect of the usefulness of STORAGE, i.e., the possibility of calibrating an SRG by only using information at coarser resolutions (AMR, MAP, and so on) and then generating continuous series which preserves sample features (often un-known for lack of data) in an acceptable way at high resolutions.

**Figure 16.** Montalto Uffugo rain gauge: EV1 probabilistic plots, showing the comparison among synthetic and observed AMR series.

**Figure 17.** Montalto Uffugo rain gauge: Gaussian probabilistic plots, showing the comparison among synthetic and observed series, regarding annual and seasonal rainfall, and annual number of wet days.

**Figure 18.** Reggio Calabria rain gauge: EV1 probabilistic plots, showing the comparison among synthetic and observed AMR series.

**Figure 19.** Reggio Calabria rain gauge: Gaussian probabilistic plots, showing the comparison among synthetic and observed series, regarding annual and seasonal rainfall, and annual number of wet days.

**Figure 20.** Vibo Valentia rain gauge: EV1 probabilistic plots, showing the comparison among synthetic and observed AMR series.

**Figure 21.** Vibo Valentia rain gauge: Gaussian probabilistic plots, showing the comparison among synthetic and observed series, regarding annual and seasonal rainfall, and annual number of wet days.

**Figure 22.** Montalto Uffugo rain gauge: comparison between STORAGE and NSRP\_v0 performances, focusing on the mean, standard deviation and percentage of dry intervals for the continuous 1-h series.

**Figure 23.** Montalto Uffugo rain gauge: comparison between STORAGE and NSRP\_v0 performances, focusing on the mean, standard deviation and percentage of dry intervals for the continuous 20-min series.

**Figure 24.** Montalto Uffugo rain gauge: comparison between STORAGE and NSRP\_v0 performances, focusing on mean values of monthly rainfall heights.

**Figure 25.** Montalto Uffugo rain gauge: comparison between STORAGE and NSRP\_v0 performances, focusing on ADF curves.

#### **5. Conclusions**

The developed STORAGE software constitutes a very useful user-friendly tool for generating long rainfall time series at high resolutions, which could be applied as input data in many hydrological analyses, such as in the continuous rainfall-runoff modeling.

The innovative aspects of the software regard: (i) the possibility of using information, for model calibration, from observed time series which are longer than continuous data sample at high resolutions; (ii) the modelling of seasonality by adopting goniometric series, which allows for a more parsimonious approach with respect to considering monthly parametric sets (as is usually done).

The presented version of STORAGE software, available at https://sites.google.com/ unical.it/storage, is currently suitable for the reproduction of rainfall series which exhibit a clear EV1 behaviour in terms of AMR and present values of annual and seasonal precipitation that are typical of the Mediterranean area.

Future developments will concern: (i) the extension of the ensemble of the parametric sets and the possibility to use other probability distributions for some rainfall features and other shapes besides the rectangular one for rain cells, in order to apply the model in other regions with different climates with respect to the investigated area; (ii) the implementation of a module for obtaining perturbed synthetic series, which can be representative of future hypothesized rainfall scenarios on spatial and temporal hydrological scales.

Moreover, the authors consider as very important the possibility of implementing in STORAGE specific modules related to soft computing methods (widely used in recent literature [60–62]), in order to provide different approaches for a specific case study. This aspect will allow to immediately compare the performances of an SRG (having a mathematical structure which is "physically-based", as it models some aspects of rainfall genesis, see Figure 2) with those from approaches such as Artificial Neural Networks (ANNs), Support Vector Regression (SVR) and Fuzzy Logic (FL), which are characterized by high nonlinearity, flexibility and data-driven learning.

**Author Contributions:** Conceptualization, D.L.D.L. and A.P.; methodology, D.L.D.L. and A.P.; software, D.L.D.L.; validation, D.L.D.L. and A.P.; data curation, D.L.D.L.; writing—original draft preparation, D.L.D.L. and A.P.; writing—review and editing, D.L.D.L. and A.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Acknowledgments:** This work was supported by the Italian Ministry for Ecological Transition (Ministero della Transizione Ecologica—Direzione Generale per la sicurezza del suolo e dell'Acqua-SUA), through the Law 5/1/2017 n. 4 "Interventi per il sostegno della formazione e della ricerca nelle scienze geologiche, per progetti di ricerca finalizzati alla previsione e alla prevenzione dei rischi geologici".

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

