1. Introduction
Accurate prediction of hydrological variables plays an irreplaceable role in areas such as drought prevention, flood warning, and water resource management (Samaniego et al., 2019 [
1]; Cloke et al., 2015 [
2]; Blöschlet al., 2019 [
3]). Over the years, process-based physical models have been employed for forecasting various hydrological variables (Talchabhadel et al., 2018 [
4]; Sirisena et al., 2018 [
5]). These models, deeply rooted in scientific theory, establish a theoretical statistical model based on physical formulas, yielding results with high reliability and interpretability. In recent years, advancements in remote sensing technologies have enhanced the capability of physical models to describe hydrological processes at finer scales (Khatakho et al., 2021 [
6]; Tarek et al., 2020 [
7]). However, limitations persist within these models. They are grounded in scientific theory, often representing or achieving merely a subset of the true processes within the ecosystem, leading to a number of constraints (Hilborn & Mangel, 1997 [
8]). Simultaneously, the calibration of these models may be significantly influenced by real-world processes not encompassed within the model itself (Arhonditsis & Brett, 2004 [
9]).
In recent years, the rapid expansion of data and continuous innovations in computational technology have led to the emergence of a powerful tool, namely deep learning (DL) models. DL models possess the capability to effectively capture complex, nonlinear spatial and temporal correlations within hydrological processes (Shen et al., 2018 [
10]). They exhibit the ability to forecast occurrences beyond sampled situations (Herrnegger et al., 2019 [
11]). Hence, DL models are considered excellent alternatives for predicting hydrological variables instead of relying solely on physical models. Deep learning (DL) is widely used in hydrology, offering precise predictions of runoff and groundwater levels. These predictions aid governments in planning water resource management strategies (Bai et al., 2016 [
12]; Kratzert et al., 2018 [
13]). Additionally, DL models can accurately forecast floods and droughts, providing valuable information for disaster preparedness and sustainable agricultural development (Zhang et al., [
14]). Furthermore, DL models help researchers gain deeper insights into regional hydrological behavior (Kratzert, Klotz, Herrnegger, et al., 2019; Kratzert, Klotz, Shalev, et al., 2019 [
15]; Read et al., 2019 [
16]; Shen, 2018; Shen et al., 2018 [
17]; Lees et al., 2021 [
18]; Nearing et al., 2021 [
19]; Yang et al., 2020 [
20]; Huang et al., 2023 [
21]). The application of various deep learning models in hydrology, such as ANNs (Yang et al., 2020; Majid et al., 2024 [
22]), CNNs (Jiang et al., 2020 [
23]), and RNNs (Sadeghi Tabas & Samadi, 2022 [
24]; Shah et al., 2024 [
25]), further underscores their applicability and extensive prospects in hydrological variable prediction. Among these models, RNNs are particularly suited for handling hydrological variable data with high temporal dependency due to their sequential processing of input data. However, their effectiveness in modeling hydrological processes with larger time scales is constrained by their difficulty in retaining time series information beyond 10 time steps (Bengio et al., 1994 [
26]). Long Short-Term Memory (LSTM) networks, an advanced iteration of Recurrent Neural Networks (RNNs), feature innovative internal gating architecture pioneered by Hochreiter and Schmidhuber in 1997 [
27], which effectively surmounts the challenge of capturing long-term dependencies in sequential data, setting a new standard for time series analysis. Owing to its exceptional ability to handle time series data, LSTM has achieved significant success in hydrological forecasting. Liu and colleagues successfully employed the LSTM model to simulate runoff phenomena in Wuhan, China (2021 [
28]). Similar initiatives were undertaken in the United Kingdom and the United States, as demonstrated by Lees et al. (2021) and Kratzert et al. (2018), respectively. Lees and colleagues applied the same LSTM methodology across a broad dataset of 669 basins in the UK, while in the US, Kratzert and associates adeptly used LSTM models for accurate runoff predictions in various basins. Moreover, LSTM’s versatility is further demonstrated by its exceptional predictive performance in groundwater and rainfall forecasting. Kapoor et al. led pioneering efforts by integrating deep learning with traditional rainfall runoff modeling, as highlighted in their 2023 study [
29]. In addition, they made significant advancements in accurately predicting cyclonic trajectories and intensities using variational recurrent neural networks, a study they published in the same year (Kapoor et al., 2023 [
30]).
Up to this point, the majority of deep learning models in the hydrology domain have typically operated independently on a single variable, such as water level (Bowes et al., 2019 [
31]), groundwater (Ali et al., 2022 [
32]), evapotranspiration (Zhao et al., 2019 [
33]), and soil moisture (Fang & Shen, 2020 [
34]; Li et al., 2022 [
35]; Li et al., 2024 [
36]). Nevertheless, the hydrological system is an extensive and intricate framework where crucial interactions typically exist among various hydrological variables. Solely modeling one hydrological variable can result in models neglecting the interactions between different hydrological processes. In physical models predicting evapotranspiration, incorporating runoff as an auxiliary variable enhances the model’s understanding of the hydrological processes linking runoff and evapotranspiration, thereby improving prediction accuracy (Herman et al., 2018 [
37]; Nesru et al., 2020 [
38]).
In deep learning, multitask learning helps models better understand the interactions among variables, thereby enhancing the efficiency and accuracy of predictions for each task (Zhang and Yang, 2021 [
39]). Additionally, studies in hydrology have shown that multitask learning enhances the understanding of hydrological processes and improves model performance. Salder et al. (2022) conducted experiments with simple deep learning models on data from 101 sites across the US mainland, revealing that at 56 sites, the Nash–Sutcliffe efficiency of multitask models exceeded that of single-task models. Li et al. (2023 [
40]) tested multitask learning with LSTM models in three large mountainous basins on the Tibetan Plateau, showing that LSTM models combined with multitask learning exhibited superior accuracy in estimating runoff volume and performance compared to pure LSTM models, with NSEs increasing by approximately 0.02. However, these studies only discuss the interactions between two variables and improve the prediction accuracy of the primary variable. They do not demonstrate the generalizability of multitask learning in hydrology. Furthermore, they do not delve into the factors and reasons that affect the performance of multitask learning in forecasting hydrological variables.
In this study, we developed a multitask learning model comprising an LSTM at the base and multiple parallel fully connected layers at the output. This model utilizes multitask learning to capture interactions between variables, strengthening the LSTM’s capacity to model diverse hydrological processes. Consequently, it enhances prediction accuracy across multiple tasks. During the experimental phase, we performed dual-task modeling for soil moisture and evapotranspiration, regulating the task loss weight allocation to investigate its impact on the multitask model’s performance. Furthermore, we explored the interaction relationships among four hydrological variables (volumetric soil water layer 1 (SM1), soil temperature level 1 (ST1), evapotranspiration (ET), and surface sensible heat flux (SSHF)), including their correlation and transfer directions. To assess the model’s generalizability, we test its ability to predict in diverse climates and extreme events. Furthermore, we evaluate its data utilization efficiency and robustness by expanding the number of prediction tasks. Our overarching goal is to demonstrate the broad applicability of multitask learning in hydrological variable prediction tasks through various experimental perspectives. Additionally, we will conduct an in-depth analysis of the factors influencing the performance of multitask learning. In the subsequent sections of this paper, we first introduce the utilized datasets, the designed deep learning model, and the detailed configurations of all experiments (
Section 2). Subsequently, we showcase and analyze the experimental outcomes on the LandBench1.0 dataset (
Section 3), followed by an in-depth discussion of the results (
Section 4), concluding with a summary of this work in
Section 5.
2. Materials and Methods
2.1. Data Sources
This study utilized the LandBench1.0 dataset for experimental data. Created by Li et al. in 2023, LandBench1.0 is a benchmark dataset designed to facilitate research in predicting land surface variables (LSVs) (Li et al., 2023 [
41]). This dataset addresses the need for a comprehensive and standardized resource to evaluate the performance of various data-driven deep learning models in hydrological and atmospheric sciences. By offering extensive coverage of variables and multiple resolutions (0.5°, 1°, 2°, and 4°), along with various lead times, LandBench1.0 improves the consistency of data-driven deep learning models for LSVs, making it a robust platform for developing and comparing predictive models.
The primary components of LandBench1.0 include data from the ERA5-Land reanalysis dataset, which provides global land surface data such as soil moisture, soil temperature, surface latent heat flux, surface sensible heat flux, and runoff. These data, derived from a combination of satellite and ground-based observations and numerical weather prediction models, ensure consistency and reliability (Muñoz-Sabater et al., 2021 [
42]). Additionally, LandBench1.0 incorporates static physiographic attributes, including soil texture, soil water capacity, and vegetation type, sourced from datasets such as SoilGrid and MODIS (Rasp et al., 2020 [
43]).
ERA5 reanalysis datasets combine various observational data sources, such as ground stations, satellites, ships, and aircraft, with numerical weather prediction models to produce globally consistent atmospheric and hydrological data with high spatial and temporal resolution. Recent studies have demonstrated that models trained with reanalysis datasets perform similarly or even better than those trained with observational datasets. For instance, Lee et al. (2021 [
44]) evaluated the reliability of ERA5 reanalysis data by comparing the performance of deep learning models trained with ERA5 data to those trained with Global Historical Climatology Network (GHCN) observational data. The study focused on temperature data from temperate (Dublin) and tropical (Singapore) regions between 2015 and 2019. Results indicated that models trained with ERA5 data performed similarly to those trained with GHCN data in temperate regions, effectively replicating the seasonal temperature trends captured by observational datasets. Yilmaz et al. (2023 [
45]) assessed the performance of deep learning models trained with ERA5 and ERA5-Land reanalysis datasets against those trained with ground-based observations from Turkey between 1951 and 2020. The study demonstrated that models trained with reanalysis datasets could accurately capture long-term trends and seasonal variations. Furthermore, the reanalysis datasets slightly outperformed observational data in capturing mean trends and temporal variability. These findings indicate that reanalysis datasets can enhance model performance, especially in regions with sparse or unavailable observational data. In 2023, Bi and colleagues introduced the Pangu weather forecasting system, trained using a 39-year reanalysis dataset ([
46]). The results showed that Pangu produced more accurate deterministic forecasts on reanalysis data compared to the world’s leading Numerical Weather Prediction (NWP) system, the ECMWF’s operational IFS, and did so at a faster speed. Notably, Pangu demonstrated comparable performance to the IFS for deterministic forecasts up to 7 days in advance. Furthermore, the Fuxi system (Chen et al. (2024) [
47]) extended this advance forecast capability to 15 days. These findings indicate that deep learning models can effectively learn the complex uncertainties and multivariable interactions present in the real atmospheric environment from reanalysis data. These studies indicate that deep learning models trained on reanalysis datasets can accurately simulate real atmospheric and hydrological environments to enhance forecast performance. Additionally, these models perform similarly to, or even surpass, those trained on observational data.
Ground station observational data often have temporal discontinuities, with missing days of data. In contrast, reanalysis datasets do not have these gaps, ensuring temporal continuity in model training data. Additionally, reanalysis datasets lack the pronounced regional differences found in ground station data, which facilitates better training and evaluation of models for global generalization. Therefore, we utilized the LandBench1.0 dataset, which spans from 1979 to 2020, focusing on data from 2000 to 2020 with a resolution of 1°. To accurately represent the complex interactions within the hydrological and atmospheric environment, we selected 15 variables, including surface, atmospheric, and static variables, as inputs for our model. Details of all selected variables are presented in
Table 1.
2.2. Model Development
We developed a multitask learning model based on LSTM to improve the accuracy of hydrological variable forecasting by leveraging the strengths of both LSTM and multitask learning. LSTM focuses on learning long-term dependencies, while multitask learning considers interactions among multiple variables. The model’s primary architecture includes an LSTM layer as the shared layer and multiple parallel fully connected layers as the output layers (see
Figure 1). In our model, the LSTM layer assimilates all input sequences (atmospheric forcing, road surface variables, and static terrain), processing them into intermediary sequences. As the shared layer, the LSTM learns rich representations across tasks and captures relationships between variables, enhancing performance in simulating hydrological processes. The intermediary sequences serve as inputs for all output modules, with each task having a dedicated module to ensure independent outputs. Next, we will detail the model’s design intricacies.
2.2.1. Long Short-Term Memory (LSTM) Networks
In Recurrent Neural Networks (RNNs), each time step’s information often includes the previous step’s data as part of its input. This aspect holds significant importance in hydrological variable forecasting due to the high auto-correlation present in these variables (that is, a high degree of correlation between the current time’s value and the value from the previous time step). LSTM, a variant of RNNs, addresses RNNs’ weakness in handling long-term dependencies (Sherstinsky, 2020 [
48]). LSTM’s capability in dealing with long-term dependencies lies in the coordinated operation of its internal memory units (c(t) in
Figure 1) and hidden units (h(t) in
Figure 1), capturing slow- and fast-evolving processes, respectively. Moreover, its internal three-gate structure (i.e., input, forget, and output gates) regulates the storage, deletion, and transmission of information within each unit. These frameworks contribute to enhancing the LSTM model’s ability to handle long-term dependency relationships.
2.2.2. Multitask Learning
Multitask learning involves configuring multiple tasks as optimization objectives within deep learning models (Caruana, 1997 [
49]). This approach enables models to leverage information contained within certain tasks to enhance the predictive performance of most tasks. Moreover, compared to single-task approaches, multitask learning has been proven in fields such as natural language processing (Chen et al., 2014 [
50]; Seltzer & Droppo, 2013 [
51]) and computer vision (Girshick, 2015 [
52]) to achieve higher computational efficiency, reduced overfitting, and improved generalization. This is because multitask learning guides models to acquire generic features favored by the majority of relevant tasks (Li et al., 2021 [
53]). Furthermore, multitask learning can alleviate a limitation of deep learning models, wherein deep learning typically requires a substantial amount of observational data for model training. In the field of earth sciences, acquiring data might be prohibitively expensive. Through the multitask learning approach, it might be possible to utilize variables abundant in data to assist in training the relevant variables that suffer from data scarcity, thereby enhancing the model’s predictive capacity for these data-scarce variables.
2.2.3. Model Training
Deep learning models have typically been one model per variable. Hence, we trained four single-task models to predict SM1, ST1, ET, and SSHF. Each single-task model comprises an LSTM layer, a Dropout layer, and a fully connected layer, with the loss calculation formula outlined below:
represents the observed value of round i training and
represents the predicted value of round i training. In our experiments, we employed cross-entropy for loss calculation, as indicated by the following formula:
The multitask model comprises an LSTM layer and multiple parallel fully connected layers equal to the number of input variables (for instance, if the input variables are SM1 and ET, then the multitask model has two fully connected layers) to simultaneously predict multiple variables (as illustrated in
Figure 1). The model’s loss function formula is expressed as follows:
represents the loss for the i-th round of the j-th task (Equation (1)).
represents the loss weight for the j-th task.
In traditional multitask models, parameters are typically updated by computing the losses for all tasks and then collectively updating the entire model’s parameters. However, in our model, for each output layer, only the loss gradient related to the corresponding variable is used to update that layer’s parameters. For example, only the error in predicting SM1 is employed to update the parameters of the SM1 output layer. Conversely, all variable losses are aggregated (Equation (3)) to update the shared layer, namely the LSTM layer’s parameters. We adopted this approach because the LSTM layer processes data from all variables, necessitating all task losses for parameter updates, whereas each output layer is dedicated to forecasting a single task. Therefore, only the loss associated with the predicted task is used for parameter updates to ensure the independence of output results. Through experimentation, we have demonstrated that employing this parameter update methodology yields superior predictive performance compared to the traditional approach.
2.3. Experimental Setting
To comprehensively examine the applicability of multitask learning in hydrology, we designed four experiments from various perspectives.
Experiment A and Experiment B discuss factors affecting multitask learning, specifically task weighting and the impact of different variable combinations (Li et al., 2023 [
54]). Experiment C explores how the size of the dataset constrains the performance of multitask models (Sadler et al., 2022 [
55]). Finally, Experiment D examines the inclusiveness of multitask models regarding the number of tasks.
2.3.1. Experiment A: Multitask Modeling with Different Weight Allocation Schemes
In Experiment A, we aimed to investigate the degree to which the magnitude of loss weights assigned to different variables within a multitask model affects its performance and the specific reasons behind this impact. When modeling SM1 (Task 1) and ET (Task2), we manually controlled the loss weights for these two variables, adhering to the weight allocation conditions outlined in the following formula:
represents the loss weight for the i-th task. We allocated weights to the two variables across five ratios (28, 37, 55, 73, and 82), where a ratio of 28 signifies a loss weight assignment of 0.2 for Task 1 and 0.8 for Task 2. This experiment aimed to alter the proportion of ET and SM1 in the total loss by assigning different weights to them (as indicated in Formula 3). The goal was to explore how varying proportions of ET and SM1, through different weight assignments, impact the model’s performance.
Due to the substantial influence of task weights on multitask learning, numerous studies are dedicated to devising balanced algorithms for task weight allocation in multitask models. These algorithms can generally be categorized into three types: learning-based, optimization-based, and computation-based approaches.
Learning-based approaches consider task loss weights as learnable parameters, optimizing them explicitly through gradient descent. Notable methods representing this approach include Gradient Normalization (GradNorm) (Chen et al., 2018b [
56]) and Uncertainty Weights (UW) (Kendall et al., 2018 [
57]). Optimization-based approaches transform task weight allocation into a multi-objective optimization problem, directly deriving task weights by solving equations, such as MGDA (Sener & Koltun, 2018 [
58]). Computation-based methods are among the most widely used in current multitask weight allocation algorithms. Their principle involves computing the most suitable weights by aggregating gradients or losses from all tasks. Representative methods include Dynamic Weight Average (DWA) (Liu et al., 2019 [
59]), Projecting Conflicting Gradient (PCGrad) (Yu et al., 2020 [
60]), and Gradient sign Dropout (GradDrop) (Chen et al., 2020 [
61]).
However, most current weight allocation algorithms dynamically balance different tasks (Xin et al., 2022 [
62]) without specifically emphasizing the dominant role of a particular variable in weight allocation. Therefore, in Experiment A, we did not utilize weight allocation algorithms. Instead, we manually controlled weight assignments to observe the performance of the model when a specific variable dominates the model’s parameter updates.
2.3.2. Experiment B: Multitask Modeling with Different Variable Combinations
The relationships among different variables vary in strength, and multitask learning is a method that leverages these relationships to enhance model performance by utilizing information from related variables. Therefore, the varying closeness of relationships among different variable combinations can lead to changes in the performance of multitask models. In Experiment B, we combined and modeled four important and correlated hydrological variables (SM1, ET, ST1, and SSHF) (Zhang et al., 2005 [
63]), as detailed in
Table 2. The aim was to assess the transfer trends of relationships between different variable combinations by contrasting the performance of models formed by these combinations (Chen et al., 2020).
In multitask learning, relationships among variables typically fall into three categories: positive transfer, negative transfer, and no significant transfer. Positive transfer might exhibit unidirectional transfer, where multitask learning can only utilize information from one variable to aid in training another variable.
2.3.3. Experiment C: Multitask Modeling with Varied Data Quantities
During model training, every iteration involves randomly selecting N points from the global grid and extracting consecutive seven-day data from these points spanning the years 2015 to 2019, serving as input sequences. This means that each iteration incorporates N sequences of seven days’ worth of data for model training.
Multitask learning has demonstrated its efficacy in effectively utilizing data to aid model training in domains such as natural language processing (Chen et al., 2014; Seltzer & Droppo, 2013) and computer vision (Girshick, 2015), particularly in scenarios where data are sparse. To assess the extent of information utilization of hydrological variables through multitask learning within the hydrology domain, Experiment C employs five values of N (64, 128, 256, 512, and 1024, representing five training datasets ranging from low to high amounts of training data) to model variables SM1 and ET. This experiment aims to evaluate the degree to which the multitask model utilizes data and the performance of the multitask model in cases of data sparsity in the hydrology domain.
2.3.4. Experiment D: Multitask Modeling with Three Variables
In Experiment D, we expanded the forecasting tasks to three, organized into two groups based on relationships among variables (the first group comprises: SM1–ET–ST1; the second group comprises: SM1–ST1–SSHF). This experiment aims to assess the model’s performance in predicting three tasks, investigating the model’s generalizability concerning the number of forecasting tasks. Additionally, it explores the multitask model’s capacity to learn complex task relationships as the prediction tasks increase.
2.4. Experimental Details
We conducted experiments using global surface data from the LandBench1.0 dataset with a resolution of 1°. The training data spanned from 1 January 2015 to 31 December 2019, with the initial 80% used for training and the remaining 20% for testing. The evaluation data covered the period from 1 January 2020 to 31 December 2020. During both the training and evaluation phases, each input sequence consisted of seven days. Data were input as time series, a crucial format due to the high temporal dependence of hydrological variables, which reflect states from previous periods. To prevent potential numerical overflow or instability during computations, all data were preprocessed using Min-Max Scaling, ensuring the processed data fell within the range of 0 to 1.
Each model underwent a maximum of 1000 training epochs, with 300 iterations per epoch. Following every twenty epochs, the model underwent testing. A stopping criterion, designed as an early stopping mechanism, was implemented during testing. This mechanism compared the loss of the current test to the loss of the previous one, retaining the model parameters associated with the smaller loss. If the same model parameters were preserved for ten consecutive tests, training was halted prematurely, and these preserved parameters constituted the final model parameters. Each model in the experiment was trained five times, each time with a different random initialization of model parameters. This approach aimed to account for the influence of initial parameters on model performance. The resulting metrics were averages derived from the metrics obtained by the five models. Across all experiments, the optimizer was set to Adam, with an initial learning rate of 0.001.
During the model evaluation phase, we assessed global surface grid points every 20 days. To evaluate model performance, we calculated the corresponding metrics for all predicted and actual values at each grid point using the specified formulas. The final results for each metric were obtained by summarizing and outputting the median of these metrics across all global surface grid points.
In Experiments B, C, and D, the weight allocation for tasks within the model remained constant. Specifically, Experiments B and C utilized a weight allocation of 28, while Experiment D utilized a weight allocation of 163 (with 163 allocations mirroring the 28 configuration; meaning the first task had a weight of 0.1, the second task 0.6, and the third task 0.3). These weight allocations in Experiments B, C, and D were determined as optimal through extensive experimentation. Even if they may not represent the best weight allocations for particular task combinations, they are considered suboptimal solutions.
2.5. Evaluation Metrics
This study employed various evaluation metrics for assessing and comparing single-task and multitask models: the Kling–Gupta Efficiency coefficient (KGE; Equation (5)), Nash–Sutcliffe Efficiency (NSE; Equation (6)), coefficient of determination (R
2; Equation (7)), Bias (Equation (8)), and Root Mean Square Error (RMSE; Equation (9)).
where
represents the linear relationship between predicted and observed values;
represents the proportional relationship between the predicted mean and the observed mean, where
is the mean of the predicted value and
is the mean of the observed value;
represents the proportional relationship between predicted and observed variability, and std is the standard deviation. The KGE (Kling et al., 2012 [
64]) value signifies the degree of agreement between predicted and observed values in terms of correlation, bias, and variability ratio. This metric offers a comprehensive assessment of model performance, ranging from −∞ to 1, where a value closer to 1 indicates a higher level of agreement between simulated and observed values.
NSE (Nash–Sutcliffe Efficiency) assesses the prediction accuracy of a model, ranging from negative infinity to 1. A value of 1 indicates a perfect match between the model simulation and observed values. When NSE equals 0, the model’s performance is equivalent to using the mean of the observed values, and when NSE becomes negative, the model’s performance is even worse than simply using the mean of the observed values.
R
2 evaluates the goodness of fit between predicted and actual values.
The Bias metric clearly indicates the deviation of model predictions and the extent of overestimation or underestimation concerning observed values.
RMSE (Root Mean Square Error) emphasizes the average difference between model predictions and observed values. This metric evaluates the model’s average predictive capability by measuring the square root of the average squared differences between predicted and observed values.
4. Discussion
We conducted an exhaustive comparative analysis of multitask and single-task models in forecasting hydrological variables, employing various evaluation metrics such as R2, NSE, KGE, RMSE, and Bias. Our findings consistently demonstrate the superior performance of the multitask model across the majority of scenarios. Further discussion will delve into the factors influencing multitask learning, provide further analysis of our experimental results, and shed light on some inherent limitations of our multitask model design.
4.1. Performance of Multitask Models
Figure 2 and
Figure 3 clearly show that, compared to single-task models, multitask learning models produce more consistent predictions with fewer outliers and extreme values. This advantage stems from the ability of multitask learning to improve prediction stability by learning the interactions between multiple hydrological variables. Additionally,
Figure 4 demonstrates that multitask models achieve higher prediction accuracy than single-task models across most global grid points at a resolution of 1°. This improved accuracy is due to the multitask models receiving information from multiple variables, allowing them to capture features more closely aligned with the actual hydrological cycle, thereby enhancing the models’ generalization performance. Furthermore, the 2020 annual prediction analysis for three climatic regions, as shown in
Figure 5 and
Figure 6 (highlighted by the red circles), demonstrates the multitask model’s sensitivity to seasonal variability in the variables. Moreover, during periods of significant change, the multitask model exhibits far superior predictive performance compared to the single-task model. Compared to single-task models, multitask models can produce predictions that more closely match real data even when data are sparse (
Table 3). This advantage offers a new perspective for addressing the issue of limited data for certain specific variables in the field of hydrology.
4.2. Factors Influencing Multitask Learning
Our research findings indicate that when high-precision tasks predominantly influence parameter updates in a multitask model (where high-precision tasks possess greater task weights), the model’s predictive performance is significantly enhanced. For example, using three variables—SM1, ET, and ST1—evaluated through the NSE metric, we found NSE metrics of 0.58, 0.69, and 0.92, respectively. This indicates that ET’s precision is higher than SM1’s but lower than ST1’s. In our experiments, combining SM1 with either ET or ST1, with higher weights assigned to ET or ST1, greatly improved the model’s performance. When combining ET with ST1, assigning a higher weight to ST1 maximally enhanced the model’s predictive capability.
Furthermore, our results demonstrate that the efficacy of multitask learning relies on the presence of positive transfer relationships among variables. For instance, in the experiment involving SSHF and ST1, the absence of a significant transfer relationship between ST1 and SSHF resulted in no improvement in SSHF’s prediction accuracy, despite the high-precision ST1 dominating the gradient of loss in the LSTM layer’s parameter updates. Hence, the successful application of multitask learning depends on the relationships among variables. Multitask learning can enhance model predictive performance through positive inter-variable relationships; however, in cases of negative transfer or insignificance, it fails to aid in enhancing the model’s performance.
The allocation of task weights can mitigate negative transfer to some extent, where training multiple tasks together leads to a decline in model performance. Alternatively, it helps the model assign greater weights to tasks with positive correlations, thus leveraging task relationships fully to facilitate model training.
4.3. Selection of Multitask Forecasting Variables
To leverage the benefits of multitask learning, the variables involved in forecasting within the multitask model must have some common drivers (such as physical, biological, or chemical relationships). The four hydrological variables selected in this study (SM1, ET, ST1, and SSHF) exhibit close interconnections within the hydrological cycle. Additionally, other relevant observational variables, such as runoff and surface heat radiation, can be incorporated from equations such as the water balance and heat equations. As indicated by the results of Experiment D, the multitask model demonstrates generalizability concerning the number of forecasting tasks. Thus, with an adequate volume of training data for the chosen variables, multitask learning can significantly enhance model performance.
4.4. Task Loss Weight Allocation
In Experiment A, we employed a manual task weight control approach to explore how task loss weighting impacts the performance of multitask models. However, this method, while useful for gaining an overall understanding of weight distribution, falls short in determining precise optimal weight combinations. Therefore, we advocate for using task weight balancing algorithms within multitask learning frameworks when modeling multiple tasks. These algorithms efficiently attain optimal weight combinations and exhibit remarkable generalizability concerning the number of tasks forecasted, automatically considering intricate inter-variable relationships.
4.5. Limitations of the Study and Future Prospects
In this endeavor, we devised a model to assess the feasibility and superiority of multitask learning on hydrological variables. However, the inherent simplicity of the multitask model structure we designed may constrain its capacity to grasp intricate task relationships. In various domains, numerous sophisticated multitask models have been developed to tackle complex and challenging predictions across diverse tasks or scenarios. Take computer vision, for instance. Liu et al. (2019) proposed an End-To-End Multitask Learning With Attention (MTAN) framework comprising a shared network with a global feature pool and a soft attention module for each task. These modules adeptly extract features pertinent to a specific task from the global features, demonstrating exceptional robustness in task loss weighting algorithms. In contrast, within the hydrological domain, multitask learning has largely remained at the stage of employing simplistic models to handle multiple variable features. Consequently, our forthcoming efforts will be directed towards the development of a hydrology-specific multitask learning model, grounded in the distinctive data characteristics of hydrological variables.
5. Conclusions
In this study, we developed an LSTM model based on multitask learning principles to exploit relationships between hydrological variables, enhancing predictive performance. We conducted four experiments using the LandBench1.0 dataset: (A) exploring different weight allocation schemes, (B) investigating variable combinations, (C) analyzing different data volumes, and (D) examining models with three variables. In light of the experiments conducted, our study yields several key findings regarding the application of multitask learning in hydrological variable prediction.
Firstly, employing optimal weights for variables in multitask models results in significant enhancements in predictive performance, as demonstrated by notable improvements in R2, KGE, and NSE metrics for various variable combinations compared to single-task models. Specifically, the SM1-ET model showed improvements of 0.23, 0.06, and 0.11 for SM1, and 0.06, 0.04, and 0.03 for ET. In the SM1-ST1 model, SM1’s metrics improved by 0.2, 0.04, and 0.1, and in the ET-ST1 model, ET’s metrics improved by 0.09, 0.05, and 0.05. These findings underscore the effectiveness of multitask learning in enhancing LSTM model performance for hydrological variables.
Secondly, the multitask model consistently outperforms the single-task model as the number of forecast tasks increases, while maintaining consistent relationships among variables. This highlights the robustness of the multitask model in handling varying task quantities without compromising performance or altering variable relationships.
Thirdly, our findings indicate that multitask learning allows for more effective utilization of training data by the model, as evidenced by superior predictive performance with a smaller training data volume compared to the single-task model. This suggests that multitask learning optimizes data utilization and enhances model efficiency.
Lastly, the allocation of task loss weights significantly influences the effectiveness of multitask learning. Variables with strong positive correlations, such as SM1 with ET and SM1 with ST1, prove advantageous for multitask learning, leading to substantial performance enhancements. Conversely, unidirectional transfer relationships and combinations lacking significant correlation may constrain the efficacy of multitask learning, highlighting the critical role of inter-variable relationships in influencing model performance.
In conclusion, our study underscores the effectiveness of multitask learning in improving hydrological variable prediction, offering insights into its mechanisms and highlighting its potential applications in hydrological modeling and forecasting.