1. Introduction
The design of rainwater management structures and waterworks, the determination of floodplain limits, and the evaluation of river structures’ safety for gauged or ungauged watersheds requires water discharge information. The underestimation of the quantity and velocity of runoff in a region can cause great damage. Clear examples of this are the 1986 disaster in California when the Yuba River broke a levee and damaged almost 4000 homes [
1] and when, in 2010, hurricane Alex brought to Mexico flood damages up to 1.5 billion dollars mainly in the city of Monterrey [
2]. It is of great importance then to have an accurate estimation of the relations between rainfall and runoff, especially if they can be applied in a simple manner and obtained from precise and available data like topography, geomorphology, or measurable characteristics of the watershed.
Since precipitation varies constantly within a watershed and the availability and precision of pluviometric data can be questionable, especially in developing countries like Mexico, many researchers have focused their efforts on understanding the relationships between the watershed’s geomorphology and its hydrology through empiric relations using synthetic (the term “synthetic” suggests that it has been determined based on watershed measurable characteristics and not on the runoff-rainfall relationship information) methods [
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13] and not the actual hyetograph (graphical or tabular representation of the distribution of rainfall intensity over time) information. Examples of this approach can be found since 1938 when Snyder determined a relation between the rainfall duration and the basin lag time [
3]. In 1957, the SCS (the Soil Conservation Service, now the Natural Resources Conservation Service (NRCS)) method was developed by the U.S. Department of Agriculture [
5], which calculates a runoff coefficient known as the Curve Number (CN) based on the land use, the soil hydrological group for antecedent moisture conditions, and hydrological conditions. These two methods are widely used even though their accuracy is low and their subjectivity high, because they are practical and easier to use than the more precise methods like the ones based on probability distribution functions like Gamma, Beta, or Weibull [
7,
9,
14], which in turn present their own disadvantages like an infinite base time for the hydrograph since the recession limb is described asymptotically to zero flow or the trial and error calibration of parameters.
A useful approach is to consider the direct runoff hydrograph for a given watershed that results from one unit (inches or mm) of constant effective rainfall in a certain time; this is known as the Unit Hydrograph (UH). A UH is different for each watershed, and therefore, this simplification permits the comparison among the same scale hydrographs for different watersheds. Even though the UH theory assumes a spatially uniform and constant effective rainfall and a linear response of the watershed, it still approximates the flood response reasonably well, and any event’s response can then be calculated for a watershed, if the UH is determined, using the superposition and proportionality principles [
15].
A Synthetic Unit Hydrograph (SUH), then, enables the runoff estimation to be carried out with limited rainfall-runoff data or for ungauged watersheds based on their physical characteristics [
16,
17]. According to Ponce [
18], a synthetic hydrograph does not require rainfall-runoff data, and it can simulate watershed diffusion by estimating the watershed lag through the use of empirical formulas. The main advantage of using the synthetic hydrograph is that its shape can be completely determined based only on a few parameters from the watershed’s physical characteristics [
19].
The more recent attempts at developing unit hydrographs through synthetic methods have focused on trying to relate the watershed response to the characteristics of the channel network or its geomorphology. According to Beven [
20], “these geomorphological linkages result in structural regularities in the form of catchments and it should be possible to take advantage of these regularities in making hydrological predictions”. Many researchers [
21,
22,
23,
24], since Rodriguez-Iturbe and Valdes first proposed the geomorphologic approach [
8], have worked on simplifying the resulting predictions in terms of the unit hydrograph.
The simplification and combination of different methods have resulted from continuous research looking to increase the precision and practicality of the resulting hydrographs. Some of these cases are the simplified two parameter Gamma distribution [
12], the width function based-GIUH [
10,
13,
25,
26,
27,
28,
29], and GIUH-Nash [
22], among others.
Since numerous studies show that a relationship between the characteristics of the watershed and the shape of the hydrograph exists, it can be concluded that, even though these relations are complex, it is possible to develop a method that describes them in an efficient manner. This encourages searching for a method based on synthetic procedures that provide more precise results than those currently used, through a more realistic simulation of the rainfall-runoff process. In this study, physical simulations of the rainfall-runoff process were carried out, looking for more real-life results for the analysis of the relationships between topography and runoff. This information could allow the results of the synthetic methods currently used to be enhanced and determine better fitted hydrographs even for ungauged watersheds taking as input only their topographic profiles from a Digital Elevation Model (DEM).
The purpose of this study is to present the experimentation, results, and analysis that confirm that the main hydrograph parameters are attainable through the watershed topography information.
2. General Description of the System
Based on the necessity of a deeper understanding of the relations between watershed characteristics and hydrograph parameters and on the fact that water behavior is too complex to model it appropriately, a 3D runoff simulator was designed and built to run physical experiments in order to search for the main topographic characteristics that determine the hydrograph shape of a watershed.
The complete model is comprised of four main parts: topographic surface, gantry robot, rainfall simulator, and runoff collector (see
Figure 1).
The actual model can be seen in
Figure 2.
The topographic model allows reproducing various types of 3D terrains at different scales through the elevation of rigid columns according to the DEM of each terrain. The elevation of the columns was done one-by-one by a gantry robot, and its accuracy was tested with a sample of 100 cells. The average error resulted in 1.41% calculated between the average elevation obtained from ArcGIS and the sampled scaled elevation within the physical model.
The cell size (3 cm by 3 cm) of the model was chosen as reduced as the physical elements and materials allowed and validated according to the distributed hydrological models recommended by Guerra-Cobian [
30] since important scale issues arise due to this in hydrological simulations of watersheds with digital and physical models to determine hydrological parameters.
Once the terrain was complete, it was fully covered by a water-resistant liner. Since only one type of cover was used, the roughness of the terrain remained constant.
Precipitation was simulated through a seven nozzle fog jet sprinkler located at the center of the model and 1.50 m above the topographic surface.
The simulation’s runoff was collected in an acrylic tank that was connected to a PVC pipe. Here, a pressure transducer was placed at the bottom to measure the water level. The transducer was configured to obtain one sample per second. This setting was designed to reduce the noise of the level readings, generated by the falling water in the tank.
Restrictions and Limiting Conditions
Physical simulation of the rainfall-runoff process has its limitations, and they must be taken into account for the experiments and the analysis of results. These limitations are:
(1) Type of terrain and storage: Since the physical model does not allow reproducing different types of terrain, the watersheds to use in the experimentation should be as virgin as possible in order to avoid extensive urban areas and/or large hydraulic structures like reservoirs that have a great influence on the runoff behavior.
(2) Roughness: The model has a fixed value of roughness determined by the waterproof cover used.
(3) Moisture: Moisture conditions are considered such as saturated soil, since only direct runoff is simulated and the imperviousness of the model does not allow for different conditions.
(4) Homogeneity: Due to the assumption of homogeneous rainfall distribution over the watershed, the model is suitable only for small and medium sized watersheds, where this spatial assumption applies.
(5) Scale: Small and medium sized watersheds also minimize scale effects; therefore, these limited the study. In this case, watersheds between 170 and 980 km2 in extent were studied.
3. Methodology
Since Gu et al. [
31] addressed the weakness of current watershed studies and recommended the use of physically based hydrological models in order to advance hydrologic science, the experimental design was used to establish cause-effect relationships among a group of variables.
Figure 3 shows the experimental methodology followed in this study.
A sample of nine real watersheds was used, selected from a region with a high frequency of rainfall events, which allowed for a higher amount of available hydrological data. The watersheds selected corresponded to relatively virgin terrains—no large urban areas and/or important hydraulic structures—and also a relation between the x, y, and z axis scale, which allowed for a distortion ratio below 2 (i.e., x axis scale/z axis scale < 2) at the time of scaling it to the physical model dimensions. Since no recommendations in the literature were found on the acceptable distortion of hydrological models, a low value was selected, but this reduced the watershed sample to a model where the distortion ratios ranged between 0.96 and 2.
The DEM of the sampled area was obtained from INEGI [
32] with a pixel size (X, Y) of 15 m. Each watershed selected represented a topographic sample, and for each sample, the DEM information was processed using ArcGIS 10 by ESRI (see
Figure 4) and Excel 2010 by Microsoft (see
Figure 5) to obtain an adequate format to feed the robot. The elevations matrix obtained through the processing of the DEM data was validated through a Visual Basic (VB) code (see
Appendix B) that colored each cell according to the elevation in order to confirm that the arrangement agreed with the original watershed information (see
Figure 5). The scaled matrix in cm, as in the physical model, was imported to ArcGIS to be processed and to obtain the morphologic parameters of the scaled watershed. Once the data were arranged in matrix form, they were then formatted into the Arduino language to be read by the controller of the robot.
4. Parameters
According to Sukristiyanti et al. [
33], the morphometric analysis of a watershed helps in the interpretation of its shape and its hydrological characteristics, as well as in a comparative analysis among different watersheds. Several morphometric parameters were analyzed for every watershed. A number of parameters were measured using ArcGIS (watershed area
A, length of the main stream
L, maximum elevation
, minimum elevation
, distance from the outlet to the centroid of the watershed
, length of the stream of highest order
, average watershed slope
, watershed perimeter
P, and watershed axial length
) or calculated through formulas (main stream average slope
S, area ratio
, bifurcation ratio
, length ratio
, circularity ratio
, and form ratio
). International system units were used for all of the calculated parameters:
The hydrograph parameters used are: unit hydrograph peak flow (
), unit hydrograph time to peak (
), unit hydrograph base time (
), and unit hydrograph runoff volume (
V) obtained from the unit hydrograph flow data as:
where
V is in m
3/s/cm/h,
is flow at time
i in m
3/s/cm, and
is time
i in h.
5. Study Cases
The nine watersheds selected are located in the state of Chiapas, Mexico. The state is located in the southeast of the country; it borders to the north the state of Tabasco, to the west the states of Veracruz and Oaxaca, to the south the Pacific Ocean, and to the east Guatemala. It has a total area of 74,415 km
2, which represents 3.8% of the total country’s area, which makes it the eight largest state in the country. Its total urban area is only 0.36% according to the soil use and vegetation layer from CONABIO [
34].
The state capital is Tuxtla Gutierrez with a total population of 598,710 inhabitants to 2015 [
35]. This and other important cities in Chiapas can be located in
Figure 6.
Chiapas’ climate is mainly warm and tempered with high precipitation throughout the year in the north region and in the summer season in the rest of the state. The total annual rainfall varies from 1200–4000 mm. The mean annual precipitation can be seen in
Figure 6. The mean annual temperature varies depending on the region from 18–28
C. The selected watersheds can be located in
Figure 6.
The topographic, hydrological, and cell size parameters for each watershed can be seen in
Table 1.
6. Experimentation
Two phases of experimentation were carried out. The first phase had the objective of testing the homogeneity of the simulated rainfall at various pressure and height levels of the sprinkler. The second phase of experiments included the actual rainfall-runoff simulations starting with the topography reproduction, followed by the rainfall simulation over the terrain and finally collecting and recording the runoff at the outlet.
6.1. Sprinkler Pre-Testing
Three different experiments were held using different configurations of the sprinkler: as delivered by the manufacturer, with the center nozzle totally closed, and with the center nozzle orifice reduced.
An Inverse Distance Weighting (IDW) interpolation was done in ArcGIS using the average millimeters of the water column measured at different points over the model area. The results for the 45 psi for the closed nozzle and 30 psi for the reduced nozzle configurations can be seen in
Figure 7,
Figure 8 and
Figure 9. Each level of pressure in the different configurations is related to rainfall intensity between 39 and 141 mm/h.
It was found that the variance of the data increased with pressure and decreased with height, the height being a more significant factor of change. It was decided to place the sprinkler at our highest physical limit of 1.50 m. The configuration with the closed nozzle resulted in a better dispersion of water over the complete surface, but the configuration with the reduced orifice resulted in lower variance for half of the simulation area. Since some of the study watersheds cover only half of the surface, this configuration was used in those cases. The first configuration resulted in very poor performance of homogeneity and therefore is not reported.
6.2. Rainfall-Runoff Process Experiments
Each of the nine watersheds selected was reproduced in the physical model and rainfall simulated over the terrain at different intensities to obtain runoff data related to them. For every experiment, level data in inches of the water column were measured every second at the collection tank placed under the watershed outlet. This was done from the moment the rainfall simulation started and until the level stabilized or stopped accumulating. The level data were then fit to a log-logistic function to eliminate noise and converted into runoff by obtaining the derivative of the log-logistic function, which was then plotted in time to produce the hydrographs to be analyzed for each watershed. The methodology followed can be seen in
Figure 10.
The first part (Steps 1–2) is related to the topography reproduction of the watershed for which a scale factor related to the z axis must be configured to convert the elevation data from meters to the model scale in centimeters. Once the watershed is modeled, rainfall is simulated according to the previous experiments during a relatively long time (5 min was sufficient to saturate the watersheds in all cases) to obtain first the time of concentration (
) (Steps 3–8). Plotting the measured flow at this stage results in a flat bell shape as shown in
Figure 11 where
is observed at the moment when the flow starts to be relatively stable—the equilibrium discharge—in this case at 280 s. A Loess smoothing was applied in R language to reduce the noise and obtain a better estimation of the experimental time of concentration.
Once
was known, rainfall was simulated over a time lower than this to obtain a bell-shape hydrograph; this was before the watershed became saturated. The data were recorded until the water level was stable again; this means that, even after the rainfall had stopped, the data were still being stored. To validate the test, the water level data were then converted to flow and plotted against time to obtain a hydrograph as the one shown in
Figure 12 (Steps 9–16).
7. Physical Simulation Results
Five different pressures of the sprinkler related to different simulated rainfall intensities were tested for each watershed. For each individual test, the runoff was obtained from the measurements of a pressure transducer in inches of the water column that were then converted into millimeters of the water column. A log logistic function was fitted to this data to eliminate the error or noise caused by waves and other unknown perturbations.
Several tests were run from which different unit hydrographs and their parameters were obtained. All of the results for Chicomuselo’s tests are summarized in
Table 2.
The theta parameters correspond to the log logistic function fit according to Equation (
8), obtained by minimizing the Sum of Squared Errors (SSE) through Minitab’s non-linear regression feature. It should be noted that a higher SSE usually relates to more perturbations during the test, although this value is only comparable among results from the same watershed due to the differences in magnitudes.
The hydrograph and its parameters (
qp,
tp,
tb, and
V) were then obtained from the derivative of the log logistic function, through Equation (
9), which resulted in runoff data or the volume of water over time. In this case,
a represents time in seconds,
b mmH
2O, and
mmH
2O per second, which can be converted to m
3/s by multiplying mmH
2O in meters by the caption tank area, which is 0.1726 m
2, including the sensor container area used to reduce noise (liters were used to present the data in
Table 2 for better visualization since the quantities of water during the simulation were very low).
Due to the changing conditions of a partly controlled physical simulation, the Nalimov test [
36] was applied to each watershed’s experimental data in order to eliminate the presence of outliers. This test is based on the elimination of data having an
N parameter (see Equation (
10)) higher than the critical value obtained from reference tables according to the sample size. This test was chosen for its applicability to small samples as is the case for each watershed’s experiment set.
where
is the sample mean,
is the standard deviation of the sample, and
n the number of values in the sample.
8. Data Analysis
Analysis on the final set of the experimental data of all nine watersheds was carried out to obtain an expression that could represent the hydrograph parameters based on the topography of the scaled watershed as stable and predictable as possible. Two different experiments were made, first using individual polynomial regressions for each response variable on the final dataset and second using a single fuzzy logic model for all response variables.
Before fitting the data into models, a feature selection technique was used in order to remove predictor variables that did not contribute in a meaningful way to each response variable. This was carried out with a neighborhood component analysis for regression data [
37], where the generalization of all executions, i.e., for all response variables, was shown to be:
A,
S, and
. Although
A had the highest importance in general,
S and
were close behind in importance affecting the modeling for each response variable. The following experimentation analysis was executed with these variables only.
A polynomial regression, with a higher degree for each term, was selected in order to best describe the complex behavior of the relation between chosen predictor variables and response variables. Four models were trained, one for each response variable
,
,
, and
V; each using 80% of the data for training and the other 20% for validation. Two different base models were utilized, the first shown in Equation (
11), where the polynomial is third order, used for predicting
, and Equation (
12), where the polynomial is second order, used for predicting
,
, and
V.
where
is the model output for a desired response variable,
is the coefficient for the nth term, and
,
, and
are the predictor variables
A,
S, and
, respectively. In Equation (
11), the higher complexity is quite clear when compared to Equation (
12); although the model for the first response variable,
, can be fitted using the lower complexity of the model from Equation (
12), the correlation coefficient results in a value of 0.88. On the other hand, if Equation (
11) is used, the correlation reaches up to 0.94. All coefficients used for the four polynomial regression models are shown in
Table 3, where the
,
, and
V response variable models only require 10 coefficients as their models are simpler (second order polynomial regressions), whereas the model for
requires double the number of coefficients (third order polynomial) compared to the latter. It is worth noticing that the model for
does not require all coefficients, as these values are zero, but the more complex operations between these variables is what improves the performance of the model; i.e., the model could be compacted into a simpler model without sacrificing performance.
The performance of the four models is shown in
Table 4, where the coefficient of correlation,
, was used. It should be noticed that the response variable
did not perform as well when compared to the others, especially when considering that this model required a far more complex model. This can be explained by the fact that
has a very complex and unpredictable behavior when compared to the other three response variables.
Following the previous model’s performance, another experiment was carried out in order to find the minimum number of response variables required to achieve an acceptable performance. For this case, a fuzzy logic model was used, more specifically a higher order Sugeno Type-1 fuzzy logic system [
38]. In contrast to the first polynomial model shown above, less response variables are required to achieve a slightly better performance.
To allow for a direct comparison to the previous polynomial regression models, the same data separation scheme was used, that is 80% of the data for training and the remaining 20% for validation. It must be noted that, due to the more complex model training technique used by the fuzzy logic model, it was possible to achieve good results through only one response variable. Using this technique, response variable A was enough to model the behavior of all four predictor variables. Here, two hyperparameters for model training are required: the order of the Sugeno consequent and the radii used by the internal clustering of the technique. In this case, these two parameters were set to 2 and 0.8, respectively, for the best results.
The main characteristic of second order Sugeno consequents, as proposed by Castro et al. [
38], is how they represent the output, that is, as shown in Equation (
13), they can better model a more complex behavior when compared to first or zero order consequents.
where
is the output of the lth rule,
is the nth coefficient of the lth rule, and
is the input value.
This resulted in a fuzzy logic model with 1 input, 4 outputs, and 3 rules (see
Figure 17). The parameters of the model can be seen in
Table 5, where the first set of parameters is for the antecedent Gaussian membership functions and the second set is for the coefficients for all second order Sugeno consequents.
Lastly, the performance was also measured through the coefficient of correlation
, obtaining the results shown in
Table 6. By comparing these results to the polynomial regression models, the fuzzy logic model clearly outperformed the latter, especially when considering that only one model was used, instead of four, and that better performance was achieved for two predictor variables,
and
V. This improvement was mostly due to the better training algorithm of the technique used, which used recursive least squares estimation parameter adjustment, compared to the polynomial regression, which uses least squares estimation parameter adjustment. Furthermore, fuzzy models can better adjust to complex behaviors by incrementing the number of rules. In the end, fuzzy logic resulted in a simpler model, which could achieve better modeling traits when compared to polynomial regression models.
9. Discussion
The analysis of the experimental results shows that the hydrograph parameters of a watershed can be estimated with a high degree of accuracy through polynomial relations of second and third order of its topographic parameters, the most significant being the drainage area, main channel average slope, and length of the stream of highest order, or through fuzzy logic, where a better fit and less variables are needed, in this case only the drainage area. These relations were possible to obtain by enabling the repetition of different hydrological events, which is not possible in real life, through the use of a physical model that reproduces a specified terrain and simulates homogeneous rainfall over the area to finally measure the resulting direct runoff. Different watersheds under different rainfall intensities were simulated in order to reduce the degree of uncertainty of the results.
It is interesting to note that the resulting fuzzy logic model obtained the minimum number of response variables required to achieve an acceptable performance as the watershed drainage area. This is comparable to the widely used fixed base method of flow separation in hydrographs, which obtains a time after peak estimate based only on the watershed area [
39].
Once the main parameters of the hydrograph are estimated, in this case peak flow, time to peak, base time, and volume, the complete shape of the hydrograph can be obtained, for example through the use of Hermitian interpolation [
11], which does not include subjectivity or calibration techniques, and it provides a good fit when compared to observed hydrographs. Therefore, it is possible to finally estimate a hydrograph of a watershed based only on topographic information.
Although the roughness coefficient was not estimated in the study, it is considered that there are no relative effects on the results, since all the experiments were carried out with the same cover. It is clear that further analysis should be made in order to determine the relevance of this factor on the final results. This leads to an investigation that may begin in the near future.
The effects of the distortion were not included in the study since the analysis was based on the experimental data collected and no relation was attempted for the models to be applicable in real scale watersheds, which in this case were only a base for the construction of the experimental watersheds, but not a comparable reference between the resulting data.
The time scale is also very relevant for future research in order to find the relations between the experimental results and the real scale watersheds, but it is not part of the present work.
The next step in this process should be to develop a methodology that includes the effects of roughness, distortion, and time scale in order to transform the models found to be applied to the real scale. Since the degrees of magnitude for real watersheds are much higher than the experimental watersheds used, direct extrapolation is not possible.
10. Conclusions
The main outcomes of the study include, first, the fact that hydrological events, specifically the rainfall-runoff process, can be reproduced through the use of a physical model with certain limitations, but enabling repetition through experimentation. Second, empirical relations can be found between the topographic characteristics of a watershed and the hydrograph parameters of peak flow, time to peak, base time, and volume and that these relations can be as simple as second or third order polynomials. Third, the runoff behavior or the complete shape of the hydrograph can be described based only on topographic information of the watershed. The Fourth and main finding is that a unit hydrograph can be described based only on the watershed area when fuzzy logic models are applied.
11. Materials and Methods
ArcGIS 10 was used for DEM processing and topographical parameters measurement.
Log logistic fit in Minitab 16 was used.
Machine learning in MATLAB 2016b was used for the data analysis of the experiments.
Excel + VB 2010 was used for the visual analysis of the experimental hydrographs.
The R language was used for the estimation of the time of concentration of the experimental watersheds.
Arduino Mega was used as the robot controller.
Power Point for Mac 2011 was used for the physical model drawings.