**1. Introduction**

Low flow research plays a significant role in water management, such as aquatic ecosystems, irrigation, water supply, and hydroelectricity [1,2]. Applying hydrological models to low flow analysis is essential, especially for basins lacking discharge data [3]. However, hydrological models have simplified the water cycling processes and include some model parameters that cannot be directly measured [4]. Therefore, before applying the model in the interested regions, model calibration is essential to optimize the model parameters [5]. Due to the changing climate, growing scientific efforts to assess hydrological changes for future scenarios have been made. Aiming to reduce the uncertainty of future predictions, generating well-calibrated models is imperative [6].

Model calibration is the process of identifying a suitable model parameter set to minimize the difference between the simulated and observed values, represented by the objective function [7]. Thus, an excellent objective function is always the backbone of a satisfactory scientific outcome. To understand the influence of objective functions and improve the model simulation, considerable research has been carried out in recent decades (e.g., [8–11]). The most critical improvement is replacing the single objective with multiobjective (e.g., [12,13]), making the multi-objective calibration widely used in water resource applications, especially for hydrological simulations [14]. Efstratiadis and Koutsoyiannis [9] reviewed different case studies about multi-objective applications in hydrology and found that the multi-objective approach improved the identifiability of parameters in complex parameterization.

**Citation:** Yang, X.; Yu, C.; Li, X.; Luo, J.; Xie, J.; Zhou, B. Comparison of the Calibrated Objective Functions for Low Flow Simulation in a Semi-Arid Catchment. *Water* **2022**, *14*, 2591. https://doi.org/10.3390/w14172591

Academic Editor: Leonardo V. Noto

Received: 22 July 2022 Accepted: 19 August 2022 Published: 23 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Even though a significant number of studies have applied various multi-objective functions in hydrological model calibration, studies focusing on low flow analysis are limited. Shafii and De Smedt [15] calibrated the WetSpa model by combining the normal and log-transformed Nash–Sutcliffe Efficiency (NSE) as the objective function and found that it is possible to find a compromise with equal attention to both high-flows and lowflows. Kim [16] also applied the normal and log-transformed NSE in the objective function to emphasize high and low flow in a hydrograph and concluded that it worked better. Garcia et al. [3] conducted a comprehensive evaluation, particularly on low flow simulations with different objective functions in hundreds of French basins but applied the inverse transformation to make the low flow sensitive to the objective functions. Their result suggested that the combination between normal and inverse transformed Kling Gupta Efficiency (KGE) is recommended. Apart from the transformed format of objective functions based on time series, studies including the hydrological signatures in the objective function are increasing. According to the comparison between the time series-based and Flow Duration Curve (FDC)-based transformed format of objective functions, Garcia et al. [3] found using FDC based transformation is worse than the time series-based objective function for low flow indices simulation. At the same time, Lombardi et al. [17] deduced that including the match of the FDC statistic in the calibration outperformed the time domain calibration on an excellent reproduction of the low-to-average flow quantiles, based on 52 Italian catchments. Consistent with Lombardi et al. [17], Chilkoti et al. [14] found the inclusion of FDC-based signatures in objective functions could improve the performance for low flow simulation, according to the calibration of a SWAT model in a small snow-fed catchment. From the above studies, there are consistent answers to the question of whether or not taking the FDC-based signatures could help low flow simulation. On the other hand, the above studies were conducted in humid regions and little attention has been paid to relatively arid areas.

To enhance the knowledge about the influence of the calibrated objective functions in relatively arid regions, this study proposes a comprehensive evaluation by considering eight different objective functions in a semi-arid Chinese basin. The evaluated objective functions consist of varied formats, transformations, and bases, and are compared from three aspects: the hydrograph simulation, FDC simulation, and the low flow indices. To additionally explore more about the climatic influence on the objective functions, different climatic conditions are also considered in the evaluation.

#### **2. Study Area and Data**

#### *2.1. Study Area*

The study was conducted in the Bahe basin of China, which is in the northern part of the Qinling Mountains. The Ma Du Wang (MDW) hydrologic station was selected, located downstream in the Bahe basin; the watershed station before the Bahe River flows into the Weihe River. There is no large reservoir in the catchment. The catchment area is about 1760 km<sup>2</sup> (see Figure 1), the average elevation is 1170 m, and the land use is dominated by agriculture and forest. The average annual precipitation in the Bahe region is about 720 mm, and nearly 60% of precipitation occurs between July and October. Precipitation is the primary source of runoff, and the summer runoff accounts for more than 40% of annual runoff. According to the Köppen–Geiger climate classification, the watershed controlled by the MDW station belongs to Dwa and Dwb classes: monsoon-influenced hot/warm summer, semi-arid continental climate.

**Figure 1.** The location and Digital Elevation Model (DEM) information of the study area. **Figure 1.** The location and Digital Elevation Model (DEM) information of the study area.

#### *2.2. Data 2.2. Data*

The meteorological data in this study come from the National Meteorological Information Centre (NMIC) and applies the same site station information as He et al. [18]. In addition, the spatial interpolation method for areal mean precipitation and ET calculation is by the Simple Kriging, which is also the same as He et al. [18]. The runoff data are at the daily time scale in this study, obtained from the Yellow River Conservancy Commission (YRCC). The meteorological data in this study come from the National Meteorological Information Centre (NMIC) and applies the same site station information as He et al. [18]. In addition, the spatial interpolation method for areal mean precipitation and ET calculation is by the Simple Kriging, which is also the same as He et al. [18]. The runoff data are at the daily time scale in this study, obtained from the Yellow River Conservancy Commission (YRCC).

#### **3. Methods 3. Methods**

For this comparative analysis, a conceptual hydrological model, Xin An Jiang (XAJ), is calibrated with different objective functions under varied climates. More detailed information is shown in the following. For this comparative analysis, a conceptual hydrological model, Xin An Jiang (XAJ), is calibrated with different objective functions under varied climates. More detailed information is shown in the following.

#### *3.1. Hydrological Model and Model Optimization 3.1. Hydrological Model and Model Optimization*

In this study, the XAJ model, a conceptual rainfall-runoff model at a daily time step, is selected. The XAJ model was developed for relatively humid regions in China by Zhao et al. [19,20], which has become a widely used model in runoff simulation, water resources assessment, and climate change assessments (e.g., [21]). In this study area, this model has been validated [22], and the model structure applied here is the same as Lin et al. [23]. For In this study, the XAJ model, a conceptual rainfall-runoff model at a daily time step, is selected. The XAJ model was developed for relatively humid regions in China by Zhao et al. [19,20], which has become a widely used model in runoff simulation, water resources assessment, and climate change assessments (e.g., [21]). In this study area, this model has been validated [22], and the model structure applied here is the same as Lin et al. [23]. For a detailed model description, please check there.

a detailed model description, please check there. To optimize the hydrological model parameter set, an effective global optimization algorithm, the shuffled complex evolution (SCE-UA) algorithm was used in this study. This algorithm is mainly based on the concept of information-sharing and natural To optimize the hydrological model parameter set, an effective global optimization algorithm, the shuffled complex evolution (SCE-UA) algorithm was used in this study. This algorithm is mainly based on the concept of information-sharing and natural biological evolution [24,25]. It has been widely used in hydrological model calibration (e.g., [26,27]).

#### *3.2. Calibration Objective Functions*

Summarizing currently used objective functions, three different classes of objectives were evaluated; Table 1 gives more detailed information. About the criteria, both NSE and KGE are widely used in hydrology, while KGE is free of the influence of unhelpful interactions among components [28]. Therefore, KGE has been analyzed and recommended by many studies (e.g., [29–31]) and is applied here.

The calculation of KGE follows Equation (1):

$$\text{KGE} = 1 - \sqrt{(\text{r} - 1)^2 + (\alpha - 1)^2 + (\beta - 1)^2} \tag{1}$$

With

$$\begin{cases} \mathbf{r} = \frac{1}{N} \sum\_{i=1}^{N} \frac{(\mathbf{Q}\_o - \mu\_o)(\mathbf{Q}\_s - \mu\_s)}{\sigma\_o \sigma\_s} \\ \qquad \mathbf{\alpha} = \frac{\mu\_s}{\mu\_o} \\ \mathbf{\beta} = \frac{\sigma\_s}{\sigma\_o} \end{cases} \tag{2}$$

As shown in Table 1, it includes the single objectives, multi objectives, and split objectives. The single objective class, which includes OBJ1 and OBJ2, applies two different transformation approaches to the discharge in the KGE calculation. These transformations are considered to emphasize the low flow goodness of fit: one is the logarithmic transformed discharge [5,32,33] and another is the inverse transformed discharge [34]. For the multi objective class, which is from OBJ3 and OBJ6, follows the format from Garcia et al. [3], who combined the normal KGE with the inverse transformation-based KGE in the objective by the same weights for both the time-based and FDC-based series. Additionally, this study includes the logarithm transformation-based partners to explore the influence from the transformation selection. The last class considers the recommendation from Fowler et al. [35], who proposed the split KGE as the objective function and found it could significantly improve the model performance. To validate the improvement of this strategy, this split KGE is set as OBJ7. OBJ8 is proposed in this study, which applies this strategy to the suggested objective function (OBJ4) by Garcia et al. [3]. Regarding the above description, some connections or similarities exist between the evaluated objective functions, which is helpful to explore the characteristics by pair comparison.

**Table 1.** The information of evaluated objective functions in this study.


#### *3.3. Model Performance Assessment* The Differential Split-Sample Test (DSST) is applied to test the objective functions,

*3.3. Model Performance Assessment*  3.3.1. Climatic Robustness Assessment

*Water* **2022**, *14*, 2591 5 of 17

#### Climatic Robustness Assessment where two independent periods are in different conditions [36]. According to the

The Differential Split-Sample Test (DSST) is applied to test the objective functions, where two independent periods are in different conditions [36]. According to the statistical climate analysis, the climate will be drier in the future in this region [18]. Considering this, the model is calibrated with a relatively wet climate and validated in a dry climate. statistical climate analysis, the climate will be drier in the future in this region [18]. Considering this, the model is calibrated with a relatively wet climate and validated in a dry climate. Figure 2 displays the precipitation information from 1998 to 2019 in the MDW station.

Figure 2 displays the precipitation information from 1998 to 2019 in the MDW station. It is easy to see that 2000–2002 is the only continuous period that every annual precipitation is lower than the average value; the annual precipitation is around 540 mm. To provide more valuable information for the application, as considered above, the period 2000–2002 is thus set as the validation period. Correspondingly, a relatively humid period is considered to be the calibration period. Through the plot, 2003–2005 shows the highest 3-year mean precipitation value (about 681 mm per year), making it as the calibration period. In order to increase the climatic robustness, the period 2007–2009 is also set as a calibration period, since its annual precipitation (about 661 mm per year) is higher than the average value in each year. In summary, two relatively humid periods (2003–2005 and 2007–2009) are applied for the calibration evaluation, and a relatively arid period (2000–2002) is used for the evaluation in this study. It is easy to see that 2000–2002 is the only continuous period that every annual precipitation is lower than the average value; the annual precipitation is around 540 mm. To provide more valuable information for the application, as considered above, the period 2000–2002 is thus set as the validation period. Correspondingly, a relatively humid period is considered to be the calibration period. Through the plot, 2003–2005 shows the highest 3-year mean precipitation value (about 681 mm per year), making it as the calibration period. In order to increase the climatic robustness, the period 2007–2009 is also set as a calibration period, since its annual precipitation (about 661 mm per year) is higher than the average value in each year. In summary, two relatively humid periods (2003–2005 and 2007–2009) are applied for the calibration evaluation, and a relatively arid period (2000– 2002) is used for the evaluation in this study.

**Figure 2.** The precipitation (P) information in Ma Du Wang station, the blue and red window marks the calibration periods, and the yellow window marks the validation period. **Figure 2.** The precipitation (P) information in Ma Du Wang station, the blue and red window marks the calibration periods, and the yellow window marks the validation period.

#### *3.4. Assessment Criteria 3.4. Assessment Criteria*

Paying more attention to the low flow simulation does not mean reducing the general performance. Therefore, the evaluation criteria used to compare the objective functions in this study are based on the general and the low flow simulation. Table 2 shows the applied assessment criteria correspondingly, including many low-flow indices used in hydrology [2,37]. For instance, the logarithmic transformed criteria have been widely used in studies, which shows overall goodness of fitting but emphasizes low flow [4,5,33]. Another class of low flow indices measure the low flow severity at different time steps, which is more concerned by water management agencies; for example, the mean annual 3-day minimum discharges. Moreover, the usage of FDC statistics increases, since it could provide valuable information in the frequency domain [11,38]. In this class, the LFD, Q95, and Q75 are applied in the study. Paying more attention to the low flow simulation does not mean reducing the general performance. Therefore, the evaluation criteria used to compare the objective functions in this study are based on the general and the low flow simulation. Table 2 shows the applied assessment criteria correspondingly, including many low-flow indices used in hydrology [2,37]. For instance, the logarithmic transformed criteria have been widely used in studies, which shows overall goodness of fitting but emphasizes low flow [4,5,33]. Another class of low flow indices measure the low flow severity at different time steps, which is more concerned by water management agencies; for example, the mean annual 3-day minimum discharges. Moreover, the usage of FDC statistics increases, since it could provide valuable information in the frequency domain [11,38]. In this class, the LFD, Q95, and Q75 are applied in the study.


**Table 2.** The applied criteria of performance evaluation in this study.

#### **4. Results**

*4.1. Objective Functions Evaluation*

#### 4.1.1. Hydrograph Simulation

The time series of flow observation presents the temporal change in the water cycle in a basin, which is the base information for hydrological statistical analysis, such as the trend, the seasonality, etc. Therefore, assessing the performance of time series simulation is a vital evaluation aspect for objective functions.

To compare the objective function influence on the time series simulation, Figure 3 displays the probability density function (PDF) of the percent bias (Pbias) information for the period of 2003–2005. The Pbias here divides the model residual by the observed flow value, which measures the general information for relative simulation errors. From the left subplot, the logarithmic transformed objective functions are better than the inverse transformed objective functions, regardless of whether the objective is single or multi. Taking the performance classification from Moriasi et al. [39], the days that achieved good simulation (|Pbias| < 15%) account for 45% and 44% during the calibration period by the OBJ1 and OBJ3, followed by OBJ4 with 40%, which is much higher than OBJ2. The result of acceptable performance (|Pbias| < 25%) also supports the above finding, where 61%, 60%, and 57% days are achieved by the OBJ1, OBJ3, and OBJ4, respectively. When comparing the single and multi-objectives, the above results indicate that the difference between single objectives (OBJ1 and OBJ2) is much more significant than the multi objectives (OBJ3 and OBJ4). Moving to the middle subplot, which shows the result from the multi objectives, the general probability of achieving smaller Pbias for objectives based on the time series seems higher than that based on the FDC. For instance, for OBJ5 and OBJ6, the days showing a good performance account for 45% and 26%, respectively, and the values change to 62% and 48% for acceptable performance. The right subplot shows the result from all three different kinds of objectives, and OBJ4 presents a better result than others, which means the split objective functions did not improve the simulation for the hydrological time series, while between two split objective functions, OBJ8 provides a slightly better simulation performance, which makes 2% and 1% days achieve a good and acceptable performance than OBJ7, correspondingly.

Figure 4 shows the same information as Figure 3, but for the calibration result during 2007–2009. Even though the general characteristics here are in line with Figure 3, minor differences exist. Although a clear distinction appears between single objectives (OBJ1 and OBJ2), the difference between multi objectives (see the middle subplot) is smaller than the period 2003–2005. Statistically, the days in good performance accounts for 45%, 44%, 36%, and 35%, and the values change to 64%, 58%, 53%, and 50% for acceptable performance for the OBJ3, OBJ4, OBJ5, and OBJ6, respectively.

*Water* **2022**, *14*, 2591 7 of 17

**Figure 3.** The probability density function (PDF) comparison for the objective functions evaluating by the percent bias (Pbias) during the calibration period 2003–2005. **Figure 3.** The probability density function (PDF) comparison for the objective functions evaluating by the percent bias (Pbias) during the calibration period 2003–2005. and 35%, and the values change to 64%, 58%, 53%, and 50% for acceptable performance for the OBJ3, OBJ4, OBJ5, and OBJ6, respectively.

**Figure 4.** The probability density function (PDF) comparison for the objective functions evaluating by the percent bias (Pbias) during the calibration period 2007–2009. **Figure 4.** The probability density function (PDF) comparison for the objective functions evaluating by the percent bias (Pbias) during the calibration period 2007–2009.

**Figure 4.** The probability density function (PDF) comparison for the objective functions evaluating by the percent bias (Pbias) during the calibration period 2007–2009. The goodness of hydrograph fitting is also an essential measure for flow time series simulation, which has been evaluated frequently by KGE in recent years. Since this study focuses more on the low flow, the logarithmic transformed KGE results (KGElog) are also included. Table 3 presents the calculated values of KGE and KGElog during two calibration periods with all eight objective functions. In the table, the highest values for each period among objective functions are highlighted in bold, and '/' is used when the value is lower than 0. The goodness of hydrograph fitting is also an essential measure for flow time series simulation, which has been evaluated frequently by KGE in recent years. Since this study focuses more on the low flow, the logarithmic transformed KGE results (KGElog) are also included. Table 3 presents the calculated values of KGE and KGElog during two calibration periods with all eight objective functions. In the table, the highest values for each period among objective functions are highlighted in bold, and '/' is used when the value is lower than 0.

> The goodness of hydrograph fitting is also an essential measure for flow time series **Table 3.** The calibrated KGE and KGElog values during two calibration periods. **Table 3.** The calibrated KGE and KGElog values during two calibration periods.


OBJ2 0.60 0.25 / / OBJ3 0.90 **0.78** 0.77 0.83

OBJ1 0.85 0.63 **0.78 0.84**  Note: '/' is used when the value is lower than 0.

When looking at the KGE values, almost all objective functions show an acceptable performance in both calibration periods but with considerable differences. For example, during 2003–2005, the highest KGE is 0.92 from OBJ4, and the lowest value is 0.6 from OBJ2, while the difference between multi objectives is slight, which is 0.03 according to the result from 2003–2005. Comparing three different objectives classes, multi objectives show relatively higher KGE values, followed by split objectives and single ones. Focusing on the low flow simulation assessment by KGElog, three objectives produce values lower than 0, which means unacceptable. At the same time, all the multi objective functions provide good results, whose KGElog values are higher than 0.61. The highest KGElog value appears for OBJ1; this is mainly because the evaluation criterion is the same as the objective function and the KGElog values for OBJ3 are very close to OBJ1. *Water* **2022**, *14*, 2591 8 of 17 OBJ4 **0.92 0.78** 0.70 0.79 OBJ5 0.89 0.62 0.74 0.69 OBJ6 0.90 0.55 0.68 0.61 OBJ7 0.85 0.68 / / OBJ8 0.74 0.69 / / Note: '/' is used when the value is lower than 0.

Considering the balance between general and low flow simulation through two periods, OBJ3 and OBJ4 yield relatively better results, followed by OBJ1. Taking the averaged KGE and KGElog values for the two periods as the example, the result is 0.821 and 0.797 for OBJ3 and OBJ4, respectively, followed by 0.773 for OBJ1. Among the multi objectives, regardless of whether it is time series-based or FDC-based, the logarithmic transformed objectives tend to yield higher averaged measurements than the inverse transformed objectives. The averaged KGE and KGElog value of the two periods for OBJ5 is 0.736, which is 0.685 for OBJ6. When looking at the KGE values, almost all objective functions show an acceptable performance in both calibration periods but with considerable differences. For example, during 2003–2005, the highest KGE is 0.92 from OBJ4, and the lowest value is 0.6 from OBJ2, while the difference between multi objectives is slight, which is 0.03 according to the result from 2003–2005. Comparing three different objectives classes, multi objectives show relatively higher KGE values, followed by split objectives and single ones. Focusing on the low flow simulation assessment by KGElog, three objectives produce values lower

#### 4.1.2. Flow Duration Curves than 0, which means unacceptable. At the same time, all the multi objective functions provide good results, whose KGElog values are higher than 0.61. The highest KGElog value

Unlike the time series evaluation, FDC statistics could provide valuable frequency domain information. Figure 5 presents the FDC assessment result overall for the eight objective functions during 2003–2005, and each subplot contains two zoomed subplots to more clearly present the results for high and relatively low flow simulations. appears for OBJ1; this is mainly because the evaluation criterion is the same as the objective function and the KGElog values for OBJ3 are very close to OBJ1. Considering the balance between general and low flow simulation through two periods, OBJ3 and OBJ4 yield relatively better results, followed by OBJ1. Taking the

According to the left subplots, the simulated FDC from OBJ2 is far from all other curves, including the observation one. With the two zoomed subplots, OBJ2 presents substantial overestimation-to-observation for the highest 10% flow and heavy underestimation-toobservation for the lowest 50% flow. While the curves from OBJ1 and OBJ3 seem closer to the observation through two zoomed subplots, especially the low flow one. Compared with the left subplot, these multi objectives evaluated in the middle subplot produce more similar FDC simulations, especially for the high flows. While according to the zoomed low flow subplot, OBJ5 presents the closest FDC simulation to the observation, followed by OBJ3, and OBJ6 stays furthest. All simulated curves show a visible difference from the right subplot, more significant between each curve than from the left subplot. Among these objective functions, OBJ4 provides the closest simulation; the split objective functions work similarly to OBJ2. averaged KGE and KGElog values for the two periods as the example, the result is 0.821 and 0.797 for OBJ3 and OBJ4, respectively, followed by 0.773 for OBJ1. Among the multi objectives, regardless of whether it is time series-based or FDC-based, the logarithmic transformed objectives tend to yield higher averaged measurements than the inverse transformed objectives. The averaged KGE and KGElog value of the two periods for OBJ5 is 0.736, which is 0.685 for OBJ6. 4.1.2. Flow Duration Curves Unlike the time series evaluation, FDC statistics could provide valuable frequency domain information. Figure 5 presents the FDC assessment result overall for the eight objective functions during 2003–2005, and each subplot contains two zoomed subplots to more clearly present the results for high and relatively low flow simulations.

**Figure 5.** The result of the observed and simulated FDCs by all objective functions during the calibration period 2003–2005. The zoomed plots in each subplot show the result for the highest 10% flow (**left**) and the lowest 50% flow (**right**) simulations. **Figure 5.** The result of the observed and simulated FDCs by all objective functions during the calibration period 2003–2005. The zoomed plots in each subplot show the result for the highest 10% flow (**left**) and the lowest 50% flow (**right**) simulations.

According to the left subplots, the simulated FDC from OBJ2 is far from all other curves, including the observation one. With the two zoomed subplots, OBJ2 presents

Figure 6 presents the results in the same way as in Figure 5 but is based on the calibration during 2007–2009. The general characteristics presented here totally agree with findings from Figure 5, regardless of the scale difference. For instance, the curve simulated by the OBJ2 stays visibly far from the observation curve, and OBJ5 and OBJ3 yield the closest simulation curve to the observation. However, the curves from the multi objectives keep close to each other. Figure 6 presents the results in the same way as in Figure 5 but is based on the calibration during 2007–2009. The general characteristics presented here totally agree with findings from Figure 5, regardless of the scale difference. For instance, the curve simulated by the OBJ2 stays visibly far from the observation curve, and OBJ5 and OBJ3 yield the closest simulation curve to the observation. However, the curves from the multi objectives keep close to each other.

underestimation-to-observation for the lowest 50% flow. While the curves from OBJ1 and OBJ3 seem closer to the observation through two zoomed subplots, especially the low flow one. Compared with the left subplot, these multi objectives evaluated in the middle subplot produce more similar FDC simulations, especially for the high flows. While according to the zoomed low flow subplot, OBJ5 presents the closest FDC simulation to the observation, followed by OBJ3, and OBJ6 stays furthest. All simulated curves show a visible difference from the right subplot, more significant between each curve than from the left subplot. Among these objective functions, OBJ4 provides the closest simulation;

*Water* **2022**, *14*, 2591 9 of 17

the split objective functions work similarly to OBJ2.

**Figure 6.** The result of the observed and simulated FDCs by all objective functions during the calibration period 2007–2009. The zoomed plots in each subplot show the result for the highest 10% flow (**left**) and the lowest 50% flow (**right**) simulations. **Figure 6.** The result of the observed and simulated FDCs by all objective functions during the calibration period 2007–2009. The zoomed plots in each subplot show the result for the highest 10% flow (**left**) and the lowest 50% flow (**right**) simulations.

#### 4.1.3. Low Flow Indices 4.1.3. Low Flow Indices

Since this study emphasizes low flow simulation, different low flow indices are thus applied in Figure 7, where the line shows the observed value and the bar presents the simulated value. Since this study emphasizes low flow simulation, different low flow indices are thus applied in Figure 7, where the line shows the observed value and the bar presents the simulated value. *Water* **2022**, *14*, 2591 10 of 17

Through all subplots, the objective functions provide similarly good simulations to

**Figure 7.** The observed (the line) and simulated (the bars) low flow indices by all objective functions during the calibration period 2003–2005. **Figure 7.** The observed (the line) and simulated (the bars) low flow indices by all objective functions during the calibration period 2003–2005.

Conversely, the logarithmic transformed objectives provide a better estimation for the indices less sensitive to the extreme low flows (e.g., MAM30 and Q75). Observing the subplot for MAM30, the averaged estimation error is about 1.4 m3/s from the inverse

transformed objectives, which is about 13 times for the logarithmic partners.

Assessing the performance aspect, the inverse transformed objectives better estimate the indices sensitive to the extreme low flows (e.g., MAM3, MAM10, and Q95). According to the subplot for MAM10, the averaged estimation error is about 0.3 m3/s from the inverse

In Figure 8, the information is summarized similarly to Figure 7 but applies the data calibrated during 2007–2009. There are some of the same characteristics observed here as in Figure 7, such as the similar simulation from all objectives for LFD; unacceptable estimation by OBJ2, OBJ7, and OBJ8, and the higher estimation by logarithmic transformed objectives than the inverse transformed partners. However, the performance preference shows some differences here from Figure 7. First, OBJ6 produces a minor estimation error for the MAM3 and MAM10, while the OBJ5 yields almost the exact same estimation as the observed Q95. This result cannot support the finding in Figure 7 that the inverse transformed objectives produce a better estimation for the indices sensitive to the extreme low flows. Second, OBJ4 and OBJ5 provide the most similar estimation to the observed MAM30 and Q75, respectively. This result is not consistent with the finding that logarithmic transformed objectives provide a better estimation for the indices less

sensitive to the extreme low flows.

Through all subplots, the objective functions provide similarly good simulations to the observed LFD. Apart from the simulation of LFD, the objectives OBJ2, OBJ7, and OBJ8 vastly underestimate the other observed low flow indices, which are not comparable with other evaluated partners. Between the rest of the objectives, the inverse transformed objectives (OBJ4 and OBJ6) estimate the indices visibly lower than the logarithmic transformed ones (OBJ1, OBJ3, and OBJ5). For example, the average estimation for MAM3 is about 1.3 m3/s from the inverse transformed objectives, which is about 2.2 m3/s from the logarithmic transformed objectives.

Assessing the performance aspect, the inverse transformed objectives better estimate the indices sensitive to the extreme low flows (e.g., MAM3, MAM10, and Q95). According to the subplot for MAM10, the averaged estimation error is about 0.3 m3/s from the inverse transformed objectives, which climbs to 0.8 m3/s from the logarithmic partners. Conversely, the logarithmic transformed objectives provide a better estimation for the indices less sensitive to the extreme low flows (e.g., MAM30 and Q75). Observing the subplot for MAM30, the averaged estimation error is about 1.4 m3/s from the inverse transformed objectives, which is about 13 times for the logarithmic partners.

In Figure 8, the information is summarized similarly to Figure 7 but applies the data calibrated during 2007–2009. There are some of the same characteristics observed here as in Figure 7, such as the similar simulation from all objectives for LFD; unacceptable estimation by OBJ2, OBJ7, and OBJ8, and the higher estimation by logarithmic transformed objectives than the inverse transformed partners. However, the performance preference shows some differences here from Figure 7. First, OBJ6 produces a minor estimation error for the MAM3 and MAM10, while the OBJ5 yields almost the exact same estimation as the observed Q95. This result cannot support the finding in Figure 7 that the inverse transformed objectives produce a better estimation for the indices sensitive to the extreme low flows. Second, OBJ4 and OBJ5 provide the most similar estimation to the observed MAM30 and Q75, respectively. This result is not consistent with the finding that logarithmic transformed objectives provide a better estimation for the indices less sensitive to the extreme low flows. *Water* **2022**, *14*, 2591 11 of 17

**Figure 8.** The observed (the line) and simulated (the bars) low flow indices by all objective functions during the calibration period 2007–2009. **Figure 8.** The observed (the line) and simulated (the bars) low flow indices by all objective functions during the calibration period 2007–2009.

#### *4.2. Climatic Robustness Assessment 4.2. Climatic Robustness Assessment*

in a relatively dry climate.

based on the period 2007–2009.

calibration.

As mentioned above, the DSST method is applied to assess the climatic robustness of the objectives. To enhance the finding reliability and applicability, the climatic robustness evaluation validates the calibration result achieved in two different wet climate periods As mentioned above, the DSST method is applied to assess the climatic robustness ofthe objectives. To enhance the finding reliability and applicability, the climatic robustness

Figure 9 displays the observed and simulated hydrographs during the validation period, with the evaluated objectives, except for OBJ2, OBJ7, and OBJ8 due to the bad

At first sight, even though the objectives are different, all simulations follow the observation temporal change pattern, and no apparent time jags appear in both subplots. From the upper subplot, OBJ5 shows a relatively better estimation for high flows, especially the peaks, followed by OBJ4, and other objective measures are comparable. In the lower subplot, OBJ4 tends to overestimate the high flows, except for the peak flow. The rest objectives present similar simulations for most time steps, except OBJ5 for some high flows. Evaluating the simulation performance between two periods, the estimated hydrographs based on the period 2003–2005 are generally closer to the observation than evaluation validates the calibration result achieved in two different wet climate periods in a relatively dry climate.

#### 4.2.1. Hydrograph Simulation

Figure 9 displays the observed and simulated hydrographs during the validation period, with the evaluated objectives, except for OBJ2, OBJ7, and OBJ8 due to the bad calibration.

At first sight, even though the objectives are different, all simulations follow the observation temporal change pattern, and no apparent time jags appear in both subplots. From the upper subplot, OBJ5 shows a relatively better estimation for high flows, especially the peaks, followed by OBJ4, and other objective measures are comparable. In the lower subplot, OBJ4 tends to overestimate the high flows, except for the peak flow. The rest objectives present similar simulations for most time steps, except OBJ5 for some high flows. Evaluating the simulation performance between two periods, the estimated hydrographs based on the period 2003–2005 are generally closer to the observation than based on the period 2007–2009.

Due to the serious overlaps between hydrograph simulations, the information about the evaluated statistics (KGE and KGElog) is presented in Table 4 to provide more valuable information for hydrograph simulation evaluation. *Water* **2022**, *14*, 2591 12 of 17

**Figure 9.** The hydrograph plot during the validation period based on the calibration period (**a**) 2003– **Figure 9.** The hydrograph plot during the validation period based on the calibration period (**a**) 2003–2005 (**b**) 2007–2009.

the evaluated statistics (KGE and KGElog) is presented in Table 4 to provide more valuable

**Evaluation Criteria KGE KGElog KGE KGElog**  OBJ1 0.61 0.67 0.42 0.69 OBJ3 0.68 **0.70** 0.58 0.68 OBJ4 0.68 0.62 **0.61 0.71** OBJ5 **0.79** 0.67 0.58 0.63 OBJ6 0.61 0.64 0.49 0.67

Most of the objectives produce acceptable validation results based on both calibration periods. According to the values shown in the table, all the KGElog values are higher than 0.62, and most of the KGE values are higher than 0.58. As shown in bold text, all the

**Table 4.** The validated KGE and KGElog values yield by different calibrated models.

**Calibration Period 2003–2005 2007–2009** 

information for hydrograph simulation evaluation.

2005 (**b**) 2007–2009.


**Table 4.** The validated KGE and KGElog values yield by different calibrated models.

Most of the objectives produce acceptable validation results based on both calibration periods. According to the values shown in the table, all the KGElog values are higher than 0.62, and most of the KGE values are higher than 0.58. As shown in bold text, all the highest values for both criteria appear in the multi objective group, and all the values are higher than 0.7, except the KGE value during 2007–2009. Between the evaluations based on two different calibration periods, all the KGE values based on the calibrated model during 2003–2005 are higher than 2007–2009, but the KGElog values are comparable. For example, when applying the OBJ1, the KGE value based on 2003–2005 is 0.19 higher than in 2007–2009, but the difference between the two KGElog values is only 0.02. Focusing on the low flow simulation through both validation results, if taking the averaged KGElog value as the measure, OBJ3 presents the best performance, with an averaged KGElog value of 0.69. *Water* **2022**, *14*, 2591 13 of 17 highest values for both criteria appear in the multi objective group, and all the values are higher than 0.7, except the KGE value during 2007–2009. Between the evaluations based on two different calibration periods, all the KGE values based on the calibrated model during 2003–2005 are higher than 2007–2009, but the KGElog values are comparable. For example, when applying the OBJ1, the KGE value based on 2003–2005 is 0.19 higher than in 2007–2009, but the difference between the two KGElog values is only 0.02. Focusing on the low flow simulation through both validation results, if taking the averaged KGElog value as the measure, OBJ3 presents the best performance, with an averaged KGElog value of 0.69.

#### 4.2.2. Flow Duration Curves 4.2.2. Flow Duration Curves

As mentioned above, FDC statistics could provide additional information to the time series simulation. Thus, the validation evaluation also includes the FDC assessment result. Figure 10 presents the corresponding result and the left and right penal subplot show the result based on the calibrated model during 2003–2005 and 2007–2009, respectively. As mentioned above, FDC statistics could provide additional information to the time series simulation. Thus, the validation evaluation also includes the FDC assessment result. Figure 10 presents the corresponding result and the left and right penal subplot show the result based on the calibrated model during 2003–2005 and 2007–2009, respectively.

**Figure 10.** The observed and simulated FDCs for the validation period based on the calibration in (**a**) 2003–2005 (**b**) 2007–2009. The zoomed plots in each subplot show the result for the highest 10% flow (**left**) and the lowest 50% flow (**right**) simulations. **Figure 10.** The observed and simulated FDCs for the validation period based on the calibration in (**a**) 2003–2005 (**b**) 2007–2009. The zoomed plots in each subplot show the result for the highest 10% flow (**left**) and the lowest 50% flow (**right**) simulations.

Overall, the simulated FDCs from all the objectives are comparable and not far from the observed FDC, consistent in both periods. Through the simulation for the highest 10% flow and the lowest 50% flow, the results of which are shown in the left and right zoomed Overall, the simulated FDCs from all the objectives are comparable and not far from the observed FDC, consistent in both periods. Through the simulation for the highest 10% flow and the lowest 50% flow, the results of which are shown in the left and right

subplots, respectively, the difference between objectives is large for the low flow simulations. Among the low flow simulation objectives, the FDC from OBJ5 seems closer

distance from the observation. In addition, the simulated FDCs tend to be higher than the observation in subplot (a), while they spread mixed on both sides of the observation in

subplot (b).

zoomed subplots, respectively, the difference between objectives is large for the low flow simulations. Among the low flow simulation objectives, the FDC from OBJ5 seems closer to the observed curve in subplot (a), which changes to OBJ4 in subplot (b). In contrast, the simulated FDC from OBJ1 and OBJ6 in corresponding subplots (a) and (b) present a clear distance from the observation. In addition, the simulated FDCs tend to be higher than the observation in subplot (a), while they spread mixed on both sides of the observation in subplot (b). 4.2.3. Low Flow Indices To further explore the objective influence on low flow simulation, Figure 11 displays the observed and simulated low flow indices during the validation period by applying the calibrated model based on different periods. Through the validated simulations, there is no apparent conflict result shown between the different calibration periods, therefore, the generated result over both periods is described below. Consistent with the results shown in the calibration period. First, the OBJ2, OBJ7, and

#### 4.2.3. Low Flow Indices OBJ8 provide significantly different and worse estimations than other objectives for all

To further explore the objective influence on low flow simulation, Figure 11 displays the observed and simulated low flow indices during the validation period by applying the calibrated model based on different periods. Through the validated simulations, there is no apparent conflict result shown between the different calibration periods, therefore, the generated result over both periods is described below. evaluated low flow indices except LFD. Second, all the objectives based on both calibration periods appear similar to the simulation for LFD, which is about 1.2 days averagely longer than the observation. Third, all the left logarithmic transformed objectives (e.g., the OBJ1, OBJ3, and OBJ5) provide a relatively higher estimation than the inverse transformed partners (e.g., the OBJ4 and OBJ6), except for the Q95 simulation here.

**Figure 11.** The observed and simulated low flow indices by all objective functions during the **Figure 11.** The observed and simulated low flow indices by all objective functions during the

validation period. Going through the simulation over upper subplots, the inverse transformed validation period.

objectives appear closer to the observation, even though the observations between the three indices are clear. For instance, the estimated MAM30 from OBJ6 is only about 0.1 lower than the observation, while for the quartile indices, the simulations from the left objectives (e.g., the OBJ1, OBJ3, OBJ4, OBJ5, and OBJ6) are comparable, especially for Q95, whose range between those objectives is smaller than 0.5. Another interesting point is that for the logarithmic transformed objectives, the difference between the single objective and Consistent with the results shown in the calibration period. First, the OBJ2, OBJ7, and OBJ8 provide significantly different and worse estimations than other objectives for all evaluated low flow indices except LFD. Second, all the objectives based on both calibration periods appear similar to the simulation for LFD, which is about 1.2 days averagely longer than the observation. Third, all the left logarithmic transformed objectives (e.g., the OBJ1, OBJ3, and OBJ5) provide a relatively higher estimation than the inverse transformed partners (e.g., the OBJ4 and OBJ6), except for the Q95 simulation here.

multi objectives is relatively smaller for extreme low flow indices. For example, the difference between the simulations from the three objectives is about 0.5 for the evaluation of MAM3, which increases to about 1 when assessing the MAM30. **5. Discussion**  The hydrological models have been popularly applied in water research and application, while the objective functions that are suitable for calibrating the hydrological models for low flow simulation are unclear, especially in relatively arid regions. Therefore, a comprehensive evaluation of different kinds of objective functions in relatively dry areas will provide valuable information. Going through the simulation over upper subplots, the inverse transformed objectives appear closer to the observation, even though the observations between the three indices are clear. For instance, the estimated MAM30 from OBJ6 is only about 0.1 lower than the observation, while for the quartile indices, the simulations from the left objectives (e.g., the OBJ1, OBJ3, OBJ4, OBJ5, and OBJ6) are comparable, especially for Q95, whose range between those objectives is smaller than 0.5. Another interesting point is that for the logarithmic transformed objectives, the difference between the single objective and multi objectives is relatively smaller for extreme low flow indices. For example, the difference between the simulations from the three objectives is about 0.5 for the evaluation of MAM3, which increases to about 1 when assessing the MAM30.
