**1. Introduction**

Hydrological models are widely used to understand catchment processes, and to predict their behavior under different forcing conditions as these model predictions are useful for water resource planning and management at the river basin scale. However, these predictions are jeopardized, when limited spatial data and hydro-meteorological observations are available in a river basin [1]. Most of the river basins in the world are ungauged or poorly gauged [2]. Research interest on hydrological simulations for ungauged or poorly gauged catchments has increased recently, due to the use of distributed modelling, innovative scientific approaches used in model calibration and deriving model inputs from earth observations to assess numerous water-related problems. As an example, an initiative taken by the International Association for Hydrological Sciences (IAHS) on Prediction in Ungauged Basin (PUB) was aimed in achieving reliable prediction in hydrological practice [2].

During the last few decades, developments in remote sensing earth observations have improved data availability for hydrological modelling and water resources applications such as drought monitoring [3,4], flood modelling and risk management [5,6], water level monitoring [7–9], and glaciers and snow cover estimations [10,11]. Such data includes topographic, land cover, and hydro-meteorological products with different spatial (1/3600◦ × 1/3600◦ to 1◦ × 1 ◦ ) and temporal (three hourly to daily, monthly or annual) resolutions. Some of the widely used such global data products include digital elevation models [12–14], land use [15–17], precipitation [18–22], temperature [23,24], evapotranspiration [25–27], soil moisture [28,29], and water storages [13,30]. Advances in modelling techniques, model calibration methods and expansion of the availability of spatial and hydro-meteorological data over the last few decades have enabled the development of more detailed and holistic hydrological models [31]. Nevertheless, these models are only an approximate representation of the real system, hence must be calibrated using some form of observed data.

Hydrological models are usually calibrated by iteratively adjusting the model parameters to obtain a reasonable comparison between simulated and observed hydrographs at a few locations or just at the outlet of the study catchment [32]. Streamflow measurements are point data, and calibrating with only point data at the catchment outlet does not guarantee that spatially varying water balance components such as evapotranspiration are simulated accurately. Another problem in hydrological model calibration is that more than one set of parameter values may result equally good model performance. This issue is commonly referred to as equifinality or non-uniqueness [33,34]. Fenicia et al. (2007) [35] also recognized that model calibration with one variable may risk overfitting of parameter. One way to overcome some of these calibration issues is to calibrate using multi-variables (e.g., streamflow, evapotranspiration, and snow cover), and at multi-sites (e.g., using streamflow data from various streams (or sub-catchments)) within the catchment [36–39]. With the increase in the availability of global datasets, the use of two or more variables in hydrological model calibration has received considerable attention. For example, a number of studies have used remote-sensing (RS)–based evapotranspiration (ET) [40–46], soil moisture [42,47–50], snow cover and glacier mass balance [51,52], and satellite-based land surface temperature (LST) [53] for hydrological model calibration.

Among the aforementioned variables, RS-based ET can be used to constrain the hydrological modelling parameters related to water balance [1,44,45]. Immerzeel and Droogers (2008) [40] used evapotranspiration data derived from MODIS satellite images for calibration of Soil Water Assessment Tool (SWAT) in data-scarce Krishna Basin in India. Their results indicate that spatially distributed ET data at monthly temporal resolution provides a promising alternative to reduce equifinality resulted from lumping the hydrological processes together by traditional calibration with limitedly available streamflow gauge data. Rientjes et al. (2013) [43] used both streamflow and evapotranspiration to calibrate the HBV (Hydrologiska Byråns Vattenbalansavdelning) model for Karkheh Basin in Iran. Their results indicate that multivariable calibration enabled the reproduction of the catchment water balance. However, the results also suggested that streamflow data is also required to identify model simulation errors in gauged or ungauged systems. Using a distributed hydrology-soil-vegetation model (DHSVM) in Jinhua basin in China, Pan et al. (2018) [54] also mentioned that streamflow was simulated well in the single variable calibration, but not evapotranspiration, and that multivariables calibration showed more reasonable estimation of both streamflow and evapotranspiration simulated. Furthermore, in an application of GLEAM ET and ESA CCI (European Space Agency–Climate Change Initiative) surface soil moisture data for calibration of PCR-GLOBWB (PCRaster GLOBal Water Balance) in Oum er Rbia Basin in Morocco, López et al. (2017) [42] showed that multivariable calibration may help to solve the problem of over-parameterization and equifinality. Ha et al. (2018) [55] used four

global ET products (later merging them as one ET product) and leaf area index (LAI) to calibrate SWAT model in ungauged the Day basin in Vietnam and showed reasonable estimation of water balance components and crop coefficients. They suggested that the above method is feasible to calibrate soil, vegetation and eco-hydrological parameters of SWAT, when the basin is ungauged and having complex water distribution.

Despite the aforementioned studies employing RS based ET data, the potential of remote-sensing- -based evapotranspiration data for hydrological model calibration has not been fully explored. This is particularly the case in data poor regions (e.g., many river basins in South-east Asia), where any alternative source of data may bring an added value. In this study, the potential of the RS-based ET data is assessed for calibration in two ways. First, as single variable calibration, in which the hydrological model is calibrated on streamflow and ET separately, but in each case the model performance is analyzed on both variables. Second, as multivariable calibration in which both streamflow and ET are used in calibration together. Moreover, in both cases, the aim of the calibration is not limited to find the 'best parameter set', but also to identify a range of parameter sets, called hereafter 'good parameter sets', that can result in model performances similar or comparable to that of the best parameter set.

#### **2. Data and Methods**

#### *2.1. Study Area*

The Chindwin basin, which is a major sub-basin of the Irrawaddy basin, was used in this study. It is located in the north-western part of Myanmar, and has the drainage area of 110,926 km<sup>2</sup> with its 1100 km long main river. The elevation in the basin varies from about 3800 m above mean sea level (MSL) in the northern forested mountainous region to about 100 m above MSL in the southern part (Figure 1a). More than 80% of the Chindwin basin area is covered with forest. The rest of the basin is primarily covered by cropland, sparsely vegetated or bare land, and water bodies. According to the FAO soil map [56], Acrisols and Cambisols are the dominant soil types found in this river basin (Figure 1b). The Chindwin basin experiences subtropical and tropical climates [57]. Based on the available data from 2001 to 2010 from the meteorological stations (shown in Figure 1), the mean annual rainfall in the basin varies from 3900 mm (Hkamti station in the northern part) to 770 mm (Monywa station in the southern part), and the monthly average temperature varies from 17 ◦C to 40 ◦C (Figure 1c). The area average annual runoff is ~1200 mm at the basin outlet (Monywa Station).

#### *2.2. Datasets Used*

The geo-spatial data used in this study include: (1) 90 m resolution Digital Elevation Model (DEM) from the Shuttle Radar Topography Mission (SRTM) digital topographic data [14], (2) 300 m resolution land-use dataset in 2009 (GlobCover 2009) from the European Space Agency [15], and (3) 7 km resolution soil data (FAO Soil) from the UN Food and Agriculture Organization [56]. The precipitation data used in this study comprised spatially interpolated daily precipitation data from 2001 to 2010. Other meteorological data (for the same period) used included: temperature (at Hkamti, Homalin, Hakha, and Monywa), relative humidity, solar radiation, wind speed, and daily streamflow data at four stations (Hkamti, Homalin, Kalewa, and Monywa). Model calibration was carried at the aforementioned four stations (Figure 1a). More information on model forcing data can be found in Sirisena et al. (2018) [58].

**Figure 1.** Chindwin River Basin, Myanmar (**a**) location and topography, (**b**) land use (**top**) and soil type (**bottom**), and (**c**) climate. Temperature data for Mawlaik and Kalewa are not available for the same period. **Figure 1.** Chindwin River Basin, Myanmar (**a**) location and topography, (**b**) land use (**top**) and soil type (**bottom**), and (**c**) climate. Temperature data for Mawlaik and Kalewa are not available for the same period.

*2.2. Datasets Used* The geo‐spatial data used in this study include: 1) 90 m resolution Digital Elevation Model (DEM) from the Shuttle Radar Topography Mission (SRTM) digital topographic data [14], 2) 300 m resolution land‐use dataset in 2009 (GlobCover 2009) from the European Space Agency [15], and 3) 7 km resolution soil data (FAO Soil) from the UN Food and Agriculture Organization [56]. The precipitation data used in this study comprised spatially interpolated daily precipitation data from 2001 to 2010. Other meteorological data (for the same period) used included: temperature (at Hkamti, Homalin, Hakha, and Monywa),relative humidity, solarradiation, wind speed, and daily streamflow data at four stations (Hkamti, Homalin, Kalewa, and Monywa). Model calibration was carried at the aforementioned four stations (Figure 1a). More information on model forcing data can be found in Sirisena et al. (2018) [58]. The RS‐based ET data were obtained from the Global Land Evaporation Amsterdam Model (GLEAM) [26,59]. GLEAM version 3.0b (GLEAM\_v3.0b) driven only by satellite‐based data The RS-based ET data were obtained from the Global Land Evaporation Amsterdam Model (GLEAM) [26,59]. GLEAM version 3.0b (GLEAM\_v3.0b) driven only by satellite-based data (radiation, air temperature, precipitation, surface soil moisture, vegetation optical depth) spanning over 13 years (2003–2015) at daily time scale with a spatial resolution of 0.25◦ × 0.25◦ . Actual evapotranspiration presented in GLEAM is estimated as the sum of different terrestrial evaporation components such as transpiration, bare-soil evaporation, interception loss, open-water evaporation, and snow sublimation [26]. The GLEAM ET data for the study area indicates that, the monthly average evapotranspiration over the basin varies from 46 mm (minimum) in January to 121 mm (maximum) in July (Figure 2a). The average annual evapotranspiration over the basin is 1009 mm with a standard deviation of 24 mm. The annual evapotranspiration generally decreases from upstream to downstream ranging from 1106 mm to 790 mm (Figure 2b), similar to the precipitation distribution in the basin (Section 2.1). The highest evapotranspiration occurs in the Hkamti sub-basin (sub-basin 1), where most of the land cover is forest. The most downstream of the basin is covered by residential areas, crop lands and mining activities.

(radiation, air temperature, precipitation, surface soil moisture, vegetation optical depth) spanning over 13 years (2003–2015) at daily time scale with a spatial resolution of 0.25° × 0.25°. Actual evapotranspiration presented in GLEAM is estimated as the sum of different terrestrial evaporation components such as transpiration, bare‐soil evaporation, interception loss, open‐water evaporation, and snow sublimation [26]. The GLEAM ET data for the study area indicates that, the monthly average evapotranspiration over the basin varies from 46 mm (minimum) in January to 121 mm (maximum) in July (Figure 2a). The average annual evapotranspiration over the basin is 1009 mm with a standard deviation of 24 mm. The annual evapotranspiration generally decreases from upstream to downstream ranging from 1106 mm to 790 mm (Figure 2b), similar to the precipitation residential areas, crop lands and mining activities.

**Figure 2.** (**a**) Inter‐annual variability of GLEAM ET data over the Chindwin Basin for the period of 2003–2010, and (**b**) average annual evapotranspiration of each sub‐basin. **Figure 2.** (**a**) Inter-annual variability of GLEAM ET data over the Chindwin Basin for the period of 2003–2010, and (**b**) average annual evapotranspiration of each sub-basin.

#### *2.3. Model Setup 2.3. Model Setup*

The Soil and Water Assessment Tool (SWAT) was used to build a hydrological model of the basin. SWAT requires topographic, land use, soil and weather data to run hydrological simulations. The DEM data were used for stream network generation and basin delineation, which is needed to set up a SWAT model. The land‐use, soil data and slope classification derived from the DEM were used for creating Hydrological Response Units (HRUs) within the delineated sub‐basins. HRUs are the primary computational units of SWAT, in which the simulations are governed by the soil water balance equation [60,61]. The Soil and Water Assessment Tool (SWAT) was used to build a hydrological model of the basin. SWAT requires topographic, land use, soil and weather data to run hydrological simulations. The DEM data were used for stream network generation and basin delineation, which is needed to set up a SWAT model. The land-use, soil data and slope classification derived from the DEM were used for creating Hydrological Response Units (HRUs) within the delineated sub-basins. HRUs are the primary computational units of SWAT, in which the simulations are governed by the soil water balance equation [60,61].

Here the Chindwin Basin was subdivided into nine sub‐basins (Figure 2b), and 202 HRUs. The Hargreaves method [61], which requires only temperature data, was used to estimate potential evapotranspiration. Surface runoff was estimated using the Soil Conservation Services–Curve Number (SCS‐CN) method, and channel routing was based on the variable storage method. SWAT simulations were performed at a daily time step for the 2001–2010 period, with a warm‐up period of one year to allow the model to establish a stable initial state. Here the Chindwin Basin was subdivided into nine sub-basins (Figure 2b), and 202 HRUs. The Hargreaves method [61], which requires only temperature data, was used to estimate potential evapotranspiration. Surface runoff was estimated using the Soil Conservation Services–Curve Number (SCS-CN) method, and channel routing was based on the variable storage method. SWAT simulations were performed at a daily time step for the 2001–2010 period, with a warm-up period of one year to allow the model to establish a stable initial state.

#### *2.4. Model Calibration 2.4. Model Calibration*

Model calibration was carried out for streamflow and actual evapotranspiration on a monthly time step using Sequential Uncertainty Fitting Algorithm (SUFI‐2), which is a calibration and uncertainty analysis tool available within the SWAT‐CUP (SWAT‐Calibration and Uncertainty Program) [62]. SUFI‐2 has been widely used in hydrological modelling studies using SWAT [38,44,58,63–70]. Model calibration was carried out for streamflow and actual evapotranspiration on a monthly time step using Sequential Uncertainty Fitting Algorithm (SUFI-2), which is a calibration and uncertainty analysis tool available within the SWAT-CUP (SWAT-Calibration and Uncertainty Program) [62]. SUFI-2 has been widely used in hydrological modelling studies using SWAT [38,44,58,63–70].

Monthly time series of streamflow from four gauging stations and the GLEAM based evapotranspiration from nine sub‐basins were used for calibration. Based on the data availability, calibration for streamflow was performed for 2002–2010, while calibration for evapotranspiration was done for 2003–2010. Selection of model parameters for calibration was based on their sensitivity to streamflow and evapotranspiration. The initial ranges of the selected 22 parameters for calibration are presented in Table S1 in Supplementary material. Model calibration was carried out in several Monthly time series of streamflow from four gauging stations and the GLEAM based evapotranspiration from nine sub-basins were used for calibration. Based on the data availability, calibration for streamflow was performed for 2002–2010, while calibration for evapotranspiration was done for 2003–2010. Selection of model parameters for calibration was based on their sensitivity to streamflow and evapotranspiration. The initial ranges of the selected 22 parameters for calibration are presented in Table S1 in Supplementary Material. Model calibration was carried out in several iterations, with each iteration consisting of 2000 simulations with parameter sets generated using the Latin Hypercube Sampling technique [71]. To go from one iteration to the next, SUFI-2 suggests new ranges for parameter values based on their performance in the current iteration. Occasionally, SUFI- 2 suggested parameter ranges needed to be adjusted when they were beyond the acceptable parameter ranges of SWAT. The Modified Nash–Sutcliffe Efficiency (MNSE, Equation (1) [62]) was used as an objective measure of model calibration performance or objective function. When there is no significant improvement in the MNSE between two successive iterations (<0.02), the calibration process is terminated.

$$\text{MMSE} = 1 - \frac{\sum |Q\_o - Q\_s|}{\sum |Q\_o - \overline{Q\_o}|} \tag{1}$$

where, *Q<sup>o</sup>* and *Q<sup>s</sup>* are observed and simulated streamflow, respectively and *Q<sup>o</sup>* is mean observed streamflow. Similarly, MNSE of ET is calculated based on *ETGLEAM*, *ET<sup>s</sup>* and *ETGLEAM*.

The four streamflow gauging stations (used for calibration) are in series, from upstream to downstream: Hkamti, Homalin, Kalewa and Monywa (Figure 1a). The drainage area for these stations are from sub-basin 1 for Hkamti (27,439 km<sup>2</sup> ), 1 and 2 for Homalin (43,370 km<sup>2</sup> ), 1 to 4 for Kalewa (73,495 km<sup>2</sup> ), and 1 to 9 for Monywa (110,926 km<sup>2</sup> ) (Figure 2b). The calibration was carried out from upstream to downstream by considering one station at a time. Once the calibration is completed for one gauging station, the parameters of the sub-basins corresponding to the gauging station were fixed to their best-fitted values, and the calibration for the next station downstream was undertaken.

Two calibration schemes were used in this study: single variable calibration and multivariable calibration. In the single variable calibration, the model was calibrated for streamflow and evapotranspiration separately. In multivariable calibration, the calibration was carried out for streamflow and evapotranspiration together using a multivariable objective function. The multivariable objective function, here referred to as overall MNSE, is defined as the weighted sum of the MNSEs with respect to streamflow and evapotranspiration.

The multivariable calibration requires weights to be defined for each variable. Different approaches have been tried in a number of previous studies to specify weights to the multivariable objective function. For example, Abbaspour et al. (2007) and Rostamian et al. (2008) [72,73] applied weights inversely proportional to the variance of the observed data used in calibration. Rientjes et al. (2013) and Rode et al. (2007) [43,74] used the coefficient of variation, i.e., standard deviation divided by the mean, instead of the variance. However, assigning equal weights is by far the most commonly adopted approach (e.g., Franco and Bonumá 2017 and Rajib et al. 2016 [41,48]), probably due to the lack of sufficient justification for other approaches. In this study, different weight combinations varying from 0 to 1 (with the sum of two weights equals to 1) with an interval of 0.05 were tested. When the weights for streamflow are between 0.4 and 0.75 and for ET between 0.6 and 0.25 are used, we found reasonable model performance for both streamflow and evapotranspiration. Therefore, here too equal weights were adopted for both variables.

As said earlier, in this study, the calibration aim is not just to look for one 'best parameter set', but also a range of 'good parameter sets' that can produce model results comparable to the best parameter set. This is also one reason why we chose not to split the limited available dataset into calibration and validation. The assumption here is that the parameter ranges defined by the 'good parameter sets' found in the calibration also comprise of parameter sets that can produce similar level of model performance in validation. This assumption can be considered reasonable as long as the inter-annual variability in the validation period is not too different from that in the calibration period. Thus we used all the eight years of data (2003–2010) for which both streamflow and ET are available for calibration, because splitting eight years into calibration and validation would represent only a narrow band of a long-term hydrological cycle in the catchment. However, when the model performance is evaluated on two variables (streamflow and evapotranspiration) as in this study, the model result on the second variable can also be viewed as model validation. Similar examples can be found in several publications that utilized remote-sensing based ET data for hydrological model calibration, e.g., [40,44,46,50,54]. To find the 'good parameter set', we adopted a method described by Finger et al. (2011) [51], where 100 simulations with the highest model performance were used.

Although, only MNSE was used in objective function, the performance of calibrated results were also assessed with other commonly used efficiency criteria such as NSE (Nash–Sutcliffe Efficiency), PBIAS (Percentage of BIAS), and R<sup>2</sup> (coefficient of determination).

#### *2.5. Estimation of Uncertainty in Model Parameters*

Model input data, model structure, parameters, and calibration data are the main sources of model prediction uncertainties associated with hydrological modelling [75,76]. In the SUFI-2 approach used in this study, all sources of model prediction uncertainties are expressed by parameter uncertainty [62,71]. Various parameters such as CN2 (SCS Curve Number), GW\_DELAY (Ground Water Delay time), ESCO (Soil Evaporation Compensation factor), and EPCO (Plant uptake Compensation factor) (22 parameters listed in Table S1) related to streamflow and evapotranspiration computations were used for model calibration. Based on the global sensitivity analysis [62], the five most sensitive parameters with respect to simulating for streamflow and evapotranspiration separately were selected to analyze the uncertainty of parameters in each iteration.

SUFI-2 uses the Latin Hypercube Sampling technique (LHS), which generates possible sets of parameter values from the given parameter ranges. To determine the extent of uncertainty associated with each of the model parameter, the normalized uncertainty scoring method [48,77] was applied, in which parameter values are normalized between 0 and 100 using Equation (2).

$$P\_n = \left[\frac{P\_b - L\_l}{L\_l - L\_l}\right] \times 100\tag{2}$$

where *P<sup>n</sup>* is the normalized uncertainty score of a parameter of one simulation, *P<sup>b</sup>* is the parameter value of the same simulation, and *U<sup>l</sup>* and *L<sup>l</sup>* are upper and lower limits of the corresponding parameter, respectively. The minimum and maximum of the normalized scores represent the uncertainty range associated with that particular parameter. The broader the range of uncertainty scores, the higher the parameter uncertainty.

#### **3. Results**

#### *3.1. Changes of Model Performances with Di*ff*erent Iterations*

#### 3.1.1. Calibration with a Single Variable

Note that all the model performance analysis presented in this study are based on monthly time series data. When adopting the single variable calibration approach, in general, model performance with respect to a variable with which it is calibrated against (e.g., streamflow) improves while the performance with respect to another variable (e.g., evapotranspiration) decreases (Figure 3). However, here the calibration with streamflow alone resulted in a reasonable performance in both streamflow and evapotranspiration on the monthly time series data. This is particularly true with respect to R 2 . For example, in the fifth iteration of Hkamti station, the NSE of streamflow varies between 0.74 and 0.94, whereas, NSE of evapotranspiration varies between 0.30 and 0.55. At the same station and same iteration, PBIAS of streamflow and evapotranspiration varies from −20.0% to −6.7%, and <sup>−</sup>5.2% to <sup>−</sup>15.6%, respectively, while R<sup>2</sup> varies between 0.82 and 0.95 for streamflow, and between 0.8 and 0.85 for evapotranspiration. In contrast, calibration with evapotranspiration alone shows considerably low performance for streamflow. For the same example mentioned above, in the calibration with evapotranspiration alone, the NSE of streamflow ranges between −0.51 and 0.19, and NSE of evapotranspiration varies between 0.32 and 0.72. PBIAS values of streamflow and evapotranspiration vary between <sup>−</sup>60% to <sup>−</sup>12.9%, and <sup>−</sup>15.6% to <sup>−</sup>5.2%, respectively, while R<sup>2</sup> varies between 0 and 0.31 for streamflow, and 0.8 and 0.85 for evapotranspiration (under the same conditions considered above).

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 9 of 26

**Figure 3.** Model performance of each iteration in single variable calibration (**a**) Nash–Sutcliffe Efficiency (NSE), (**b**) percentage of bias (PBIAS), and (**c**) coefficient of determination (R2). 'Q' means streamflow and 'ET' means evapotranspiration. Left and right columns respectively show the results of calibration with streamflow alone and evapotranspiration alone for each efficiency indicator. The names in the sub‐plots refer to the streamflow gauging stations from up‐stream to down‐stream. Each station refers to the different sub‐basins: Hkamti—1, Homalin—1 and 2, Kalewa—1 to 4, and Monywa—1 to 9. The indicators of evapotranspiration represent comparisons of area‐weighted average evapotranspiration of simulated and observed data. **Figure 3.** Model performance of each iteration in single variable calibration (**a**) Nash–Sutcliffe Efficiency (NSE), (**b**) percentage of bias (PBIAS), and (**c**) coefficient of determination (R<sup>2</sup> ). 'Q' means streamflow and 'ET' means evapotranspiration. Left and right columns respectively show the results of calibration with streamflow alone and evapotranspiration alone for each efficiency indicator. The names in the sub-plots refer to the streamflow gauging stations from up-stream to down-stream. Each station refers to the different sub-basins: Hkamti—1, Homalin—1 and 2, Kalewa—1 to 4, and Monywa—1 to 9. The indicators of evapotranspiration represent comparisons of area-weighted average evapotranspiration of simulated and observed data.

However, when calibration is carried out at successive stations (starting at Hkamti, then Homalin, Kalewa, and ending at Monywa), the variability of performance reduces considerably for both variables (streamflow and evapotranspiration, Figure 3). This is not surprising because the upstream station is already calibrated for streamflow and evapotranspiration separately. In successive iterations, model simulations show approximately similar performance for both variables for calibration with evapotranspiration alone, but always with better values for evapotranspiration than for streamflow. However, in both calibration approaches, NSE is better for streamflow compared to evapotranspiration. For example, in the fifth iteration of Homalin, the NSE of streamflow varies from 0.91 to 0.95, and from 0.55 to 0.7 for single variable calibration with streamflow and evapotranspiration, respectively (Figure 3a). For the same iteration at the same station, weighted NSE values of evapotranspiration (from sub-basin 1 and 2) vary between 0.27 and 0.41, and between 0.67 and 0.72 in the two calibration approaches (streamflow alone and evapotranspiration alone). In contrast, streamflow shows higher bias: −15% to −11% and −28% to 21% for calibration with streamflow and evapotranspiration, respectively, whereas the bias of evapotranspiration changes between −14% to −9%, and between −5% to 4%, respectively (Figure 3b). However, R<sup>2</sup> is greater than 0.77 for streamflow, and evapotranspiration in both calibration approaches (Figure 3c), implying that both variables preserve the temporal characteristics of observed data.

To assess the variation of the model performance belongs to best model simulation with respect to number of iteration, we analyzed the NSE as a representative model performance indicator of the best simulation in each iteration. In successive calibration iterations, an increase in the NSE of the calibration variable (either streamflow or evapotranspiration) is not always accompanied by a decline in the performance of the other variable (Figure 4a). For example, in the calibration with streamflow alone, the NSE of evapotranspiration increased when the NSE of streamflow also increased, and subsequently decreased in next iteration(s) at all stations but Kalewa station (Figure 4a). Similar results were also obtained in calibration with evapotranspiration alone (Figure 4b). This may be because different parameter combinations also could simultaneously increase the performances of both variables. *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 10 of 26

**Figure 4.** NSE of the best simulation in each iteration performed at each station under (**a**) calibration only with streamflow (Q), (**b**) calibration only with ET, and (**c**) calibration with both streamflow and ET. The scales of two y‐axes in each plot are not identical. **Figure 4.** NSE of the best simulation in each iteration performed at each station under (**a**) calibration only with streamflow (Q), (**b**) calibration only with ET, and (**c**) calibration with both streamflow and ET. The scales of two y-axes in each plot are not identical.

#### 3.1.2. Calibration with Multiple‐variables 3.1.2. Calibration with Multiple-variables

In the multivariable calibration approach using streamflow and GLEAM ET, successive iterations improve the performance of evapotranspiration predictions compared to that of streamflow at all stations except the Hkamti station (Figure 5). For example, in the final iteration at Homalin, Kalewa, and Monywa, NSE values of streamflow vary between 0.86 and 0.92, 0.92 and 0.96, In the multivariable calibration approach using streamflow and GLEAM ET, successive iterations improve the performance of evapotranspiration predictions compared to that of streamflow at all stations except the Hkamti station (Figure 5). For example, in the final iteration at Homalin, Kalewa, and Monywa, NSE values of streamflow vary between 0.86 and 0.92, 0.92 and 0.96, and

over‐estimations of evapotranspiration in all three above mentioned stations. However, model results always underestimated streamflow at all but the Monywa station (Figure 5b). With respect to R2, model calibrations show good performances (>0.77) for both streamflow and evapotranspiration at all the stations (Figure 5c). This behavior implies that simulated streamflow and 0.95 and 0.97, respectively. During the same simulation, the NSE values of evapotranspiration for the same stations vary between 0.46 to 0.66, 0.53 to 0.63, and 0.63 to 0.67, respectively (Figure 5a). In addition, PBIAS values of evapotranspiration vary between −6% and +7%, indicating under and over-estimations of evapotranspiration in all three above mentioned stations. However, model results always underestimated streamflow at all but the Monywa station (Figure 5b). With respect to R<sup>2</sup> , model calibrations show good performances (>0.77) for both streamflow and evapotranspiration at all the stations (Figure 5c). This behavior implies that simulated streamflow and evapotranspiration adequately capture the temporal variability in the observed data. Compared to single variable calibration, multivariable calibration also reduced biases in both streamflow and evapotranspiration. *Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 12 of 26

**Figure 5.** Model performance of each iteration in the multivariable calibration (**a**) NSE (**b**) PBIAS (**c**) R2. Similar to Figure 3 performances of evapotranspiration are evaluated via area‐weighted average evapotranspiration for each corresponding streamflow station. **Figure 5.** Model performance of each iteration in the multivariable calibration (**a**) NSE (**b**) PBIAS (**c**) R<sup>2</sup> . Similar to Figure 3 performances of evapotranspiration are evaluated via area-weighted average evapotranspiration for each corresponding streamflow station.

are under estimated during high flow periods (June to August). This mismatch in high flows may be

When considering the best performing simulation in the last iteration for both single variable

*3.2. Model perfroamnce with 'Best Parameter Set'*

In the best performed simulation in every successive iterations of multivariable calibration, the NSE of streamflow shows an increase in most of the cases, whereas that of evapotranspiration shows a reduction (Figure 4c). However, these decreases are small (e.g., from 0.62 to 0.61 at Kalewa station, and from 0.68 to 0.64 at Monywa station). In multivariable calibration, NSEs of evapotranspiration reduce at all but Kalewa station for every successive iteration (Figure 4c). Performance of streamflow does not always increase with the reduction of evapotranspiration performances. For example, at Homalin station, NSE of streamflow increases from 0.89 to 0.92 in the second iteration and reached 0.91 by the third iteration. Thus, a clear pattern in the performance of individual variables is not identified.

#### *3.2. Model Perfroamnce with 'Best Parameter Set'*

When considering the best performing simulation in the last iteration for both single variable calibrations, results with streamflow alone show better performance in dry seasons (i.e., during low flows) than in monsoon seasons (high flows) (Figure 6a). Notably, the flows at Hkamti and Homalin are under estimated during high flow periods (June to August). This mismatch in high flows may be due to the poor spatial representation of rainfall over sub-basins. Model calibration with evapotranspiration alone under-estimates high flows and over-estimates the low flows at all stations for the entire simulation period.

When the calibration with evapotranspiration alone is considered, GLEAM ET and simulated ET show a well–matched seasonal variation of evapotranspiration in all the sub-basins (Figure 6b). However, in terms of magnitude or bias, model estimated ET is lower than the GLEAM ET during the dry period (December to February). In contrast, the model estimated ET is higher than GLEAM ET in sub-basins 6, 8, and 9 during the monsoon period. The calibration with streamflow alone under-estimates evapotranspiration in all sub-basins during the dry period, and over-estimates the same during the monsoon period particularly in sub-basins 1, 6, 8, and 9. Overall, calibration with streamflow alone appears to be capable of reproducing the observed evapotranspiration (in this case the GLEAM ET) at an acceptable level.

In multivariable calibration, simulated streamflow shows similar results obtained from calibration with streamflow alone (Figure 6a). However, high flows (during July to August) are less than the same resulted in calibration with streamflow alone. High flows are further underestimated in multivariable calibration. In contrast, simulated evapotranspiration at basin-level shows reasonably good performance similar to results obtained from calibration only with evapotranspiration (Figure 6b). Therefore, multivariable calibration shows reasonable estimations for both variables (i.e., streamflow and evapotranspiration).

Model performances corresponding to the best simulation at each of the station are presented in Table 1. Overall, in each performance indicator of streamflow, there is no significant difference among four stations. Among all sub-basins, the least accurate performances of evapotranspiration are for sub-basin 6, 8 and 9. These results imply that multivariable calibration is able to reproduce the observed streamflows better than evapotranspiration. However, model simulations produced a reasonably good estimation of evapotranspiration as well.

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 14 of 26

**Figure 6.** *Cont*.

*Remote Sens.* **2020**, *12*, x FOR PEER REVIEW 15 of 26

**Figure 6.** Comparison of simulated and observed (**a**) streamflow and (**b**) evapotranspiration for the best simulation of the last iteration for each single variable calibration approach for four stations and nine sub‐basins. **Figure 6.** Comparison of simulated and observed (**a**) streamflow and (**b**) evapotranspiration for the best simulation of the last iteration for each single variable calibration approach for four stations and nine sub-basins.

In general, the multivariable calibration with streamflow and GLEAM ET showed reasonably good streamflow estimations (NSE > 0.85). However, understandably, that calibration does not result in producing better streamflow performance than the results obtained from single variable calibration with streamflow only (Figure 4c, Figure 6a and Table 1). On the other hand, this multivariable calibration showed still sounds better for evapotranspiration as that from the single variable calibration with evapotranspiration only and better than the single variable calibration with streamflow only (Figure 6b).

*Remote Sens.* **2020**, *12*, 3768

**Table 1.** Model performance for monthly streamflow and evapotranspiration of the Chindwin River Basin. Results represent the overall best model simulation in the last iteration of the multivariable calibration at each station and sub-basin. A: Model calibration with streamflow only. B: Model calibration with evapotranspiration only. C: Model calibration with both streamflow and evapotranspiration. The best statistical performance among the streamflow stations and among the evaporation at sub-basin levels indicates in bold for each calibration approach.

