Deep Reinforcement Learning for Multi-Objective Real-Time Pump Operation in Rainwater Pumping Stations

Joo, Jin-Gul; Jeong, In-Seon; Kang, Seung-Ho

doi:10.3390/w16233398

Open AccessArticle

Deep Reinforcement Learning for Multi-Objective Real-Time Pump Operation in Rainwater Pumping Stations

by

Jin-Gul Joo

¹,

In-Seon Jeong

² and

Seung-Ho Kang

^3,*

¹

Department of Civil and Environmental Engineering, Dongshin University, Naju 58245, Republic of Korea

²

Department of Software Engineering, Chonnam National University, Gwangju 61186, Republic of Korea

³

Department of Computer Science, Dongshin University, Naju 58245, Republic of Korea

^*

Author to whom correspondence should be addressed.

Water 2024, 16(23), 3398; https://doi.org/10.3390/w16233398

Submission received: 28 October 2024 / Revised: 15 November 2024 / Accepted: 23 November 2024 / Published: 26 November 2024

(This article belongs to the Section Urban Water Management)

Download

Browse Figures

Versions Notes

Abstract

:

Rainwater pumping stations located near urban centers or agricultural areas help prevent flooding by activating an appropriate number of pumps with varying capacities based on real-time rainwater inflow. However, relying solely on rule-based pump operations that monitor only basin water levels is often insufficient for effective control. In addition to maintaining a low maximum water level to prevent flooding, pump operation at rainwater stations also requires minimizing the number of pump on/off switches. Reducing pump switch frequency lowers the likelihood of mechanical failure and thus decreases maintenance costs. This paper proposes a real-time pump operation method for rainwater pumping stations using Deep Reinforcement Learning (DRL) to meet these operational requirements simultaneously, based only on currently observable information such as rainfall, inflow, storage volume, basin water level, and outflow. Simulated rainfall data with various return periods and durations were generated using the Huff method to train the model. The Storm Water Management Model (SWMM), configured to simulate the Gasan rainwater pumping station located in Geumcheon-gu, Seoul, South Korea, was used to conduct experiments. The performance of the proposed DRL model was then compared with that of the rule-based pump operation currently used at the station.

Keywords:

rainwater pumping station; real-time control system; deep reinforcement learning; pump operation; Storm Water Management Model

1. Introduction

Rainfall in specific urban areas and surrounding regions flows through complex drainage pathways before reaching rivers. However, in the event of a sudden downpour, direct inflow into rivers can lead to flooding, causing significant damage to adjacent areas. With the increasing frequency of localized heavy rainfall due to climate change, the scale of flood-related damage has grown to the extent that it poses a substantial burden on national economies [1]. To mitigate flood risks in areas adjacent to rivers, rainwater pumping stations are installed and operated nearby to act as a buffer. These pumping stations are equipped with multiple pumps of varying capacities based on their size and discharge accumulated rainwater into retention basins, effectively preventing river overflow and flood damage. Most rainwater pumping stations use a rule-based pump operation strategy that activates or deactivates pumps based on predetermined water levels in the retention basin. However, simple rule-based operations reliant on water levels—or on manual control based on operator experience—are inadequate for responding to the rapid changes in water inflow often triggered by localized heavy rains, which are occurring more frequently due to climate warming [2,3].

Several studies have explored strategies for optimizing urban drainage system operations. Zhuan et al. [4] formulated an optimization problem aimed at minimizing both energy and maintenance costs while maintaining a target water level at pumping stations, proposing dynamic programming as a solution approach. Bachtiar et al. [5] proposed the use of linear programming for the optimal operation of integrated cascade reservoirs, while Jafari et al. [6] suggested applying particle swarm optimization (PSO), a type of metaheuristic algorithm, for real-time pump system operations aimed at urban flood prevention. Similarly, Mounce et al. [7] proposed a real-time control system using a genetic algorithm-based heuristic approach to prevent sewer overflow. While dynamic programming, linear programming, and heuristic algorithms are effective for finding optimal solutions, they face limitations in real-time control when the number of variables to consider increases, as recalculating the optimal solution for each scenario becomes computationally demanding.

In addition to dynamic programming and heuristic algorithms, Model Predictive Control (MPC) is another approach for real-time control. MPC predicts future system behavior and calculates optimal control inputs in real-time based on those predictions. Sadler et al. [8] applied MPC to address challenges in stormwater systems influenced by tidal variations, while Sun et al. [9] proposed an MPC method for the real-time control of the Urban Water Cycle. However, MPC relies on accurate models of the drainage system and weather forecasts. Any errors in these models can impact the performance of the controller. In addition, real-time optimization can be computationally intensive, especially in large, complex urban drainage systems with many control variables.

In recent years, groundbreaking advancements in artificial intelligence and machine learning, especially in deep learning, have spurred extensive research across engineering fields [10,11,12,13,14]. Interest in applying deep learning to urban water management has also intensified, with studies exploring its applications not only in system operation but also in system prediction, asset assessment, planning and maintenance, and anomaly detection [15,16]. Machine learning methods are generally categorized as supervised, unsupervised, or reinforcement learning, depending on the data structure, label availability, and learning approach. Consequently, selecting an appropriate deep learning model requires careful consideration of the specific characteristics and objectives of the application.

In the operation of rainwater pumping stations, determining the optimal pump activation policy based on factors such as rainfall and retention basin water levels cannot be addressed effectively using supervised learning models, which require predefined correct responses for all scenarios. Similarly, unsupervised learning, which is often used for data distribution or feature analysis, is not well-suited to this task. For real-time optimization of pump operations, where the model must achieve objectives such as minimizing retention basin water levels in response to evolving conditions, Deep Reinforcement Learning (DRL) is considered the most appropriate approach due to its ability to leverage reward mechanisms [15,17]. Consequently, recent studies have begun exploring the use of DRL for real-time control of stormwater drainage systems [18,19,20].

Mullapudi et al. [21] proposed a Deep Q-Network (DQN), a type of deep reinforcement learning, for maintaining optimal water levels across multiple ponds in stormwater systems by scheduling valve operations based on real-time data, such as rainfall and water levels, collected from sensors. They demonstrated that the performance of DRL-based control systems is highly sensitive to the type of reward function used. Bowes et al. [22] introduced a Deep Deterministic Policy Gradient (DDPG) approach to automatically adjust valves and reduce flood volume, using predictive information on future rainfall and tidal conditions. Saliba et al. [23] applied the DDPG algorithm for stormwater system control, showing that it outperformed rule-based methods even when using uncertain rainfall and water level predictions to mitigate flooding effectively. Meanwhile, Xu et al. [24] utilized DQNs to optimize power generation in the operation of hydropower reservoir systems.

Most studies on DRL applications focus on fine-tuning one or more pumps or valves throughout urban drainage systems within the broader stormwater network, which generally requires a smart stormwater infrastructure [25]. Research specifically targeting pump operations at individual rainwater pumping stations, without relying on such extensive infrastructure, is relatively rare. Additionally, apart from Zhuan et al. [4], most studies on real-time control systems focus on a single objective, such as maintaining low water levels or preventing overflow. However, the optimal operation of rainwater pumping stations involves not only keeping retention basin water levels low but also minimizing fluctuations in the number of active pumps to reduce energy consumption and pump wear. Furthermore, most existing studies primarily address water level control under typical rainfall conditions, treating extreme rainfall events as exceptional cases. This highlights the importance of research into proactive, real-time pump operation strategies for effectively managing extreme rainfall events. The advantages and disadvantages of various models applicable to drainage systems, such as stormwater pumping stations, are summarized and compared in Table 1.

This study aims to address a multi-objective optimization problem for rainwater pump station operation under extreme rainfall conditions. Specifically, it seeks to minimize not only water levels but also the maintenance costs associated with pump combination adjustments. To tackle this, we rigorously define the multi-objective optimization problem and propose appropriate reward functions along with the Double Deep Q-Network (DDQN), an improved version of the Deep Q-Network. Currently, research on this specific problem and DRL-based solutions is scarce.

DQN, a reinforcement learning method that enables an agent to select optimal actions within a given environment, has been applied in areas such as Atari games [26], robotic control [27,28], and water management systems [29,30]. DDQN is particularly known for its ability to address the common issue of overestimation bias in Q-learning algorithms. By using separate Q-network and Target network structures, DDQN provides stable value estimates, and its use of experience replay enhances data efficiency.

Unlike most prior studies, which use basic deep neural networks as agents without accounting for the temporal nature of pump station operation data, this research employs the Gated Recurrent Unit (GRU), a deep learning model suited for time-series data, as the agent. Additionally, to enable proactive learning and response to extreme rainfall driven by climate change, we generated synthetic extreme rainfall data with varying return periods and rainfall durations using the Huff method. To verify the practical applicability and performance of the proposed method, we modeled the retention basin and pump configuration of the Gasan pumping station in Seoul, South Korea, and conducted simulations using the Storm Water Management Model (SWMM) with the generated synthetic rainfall data.

The key contributions of this paper are as follows:

We define a multi-objective optimization problem that considers both the minimization of retention basin water levels and the reduction of maintenance costs due to pump switching. To address this, we developed a DDQN model.
By incorporating a time-series-aware agent and designing an effective reward function, our experimental results demonstrate that the proposed model can maintain lower water levels in the retention basin compared to rule-based pump policies while simultaneously minimizing maintenance costs.
We accurately modeled the pump and retention basin environment of the Gasan pumping station in Seoul, comparing the DDQN-based approach with the rule-based method currently in use, thereby providing insights for potential operational improvements.
We developed a control system that can effectively respond to rainfall fluctuations resulting from climate change and rapid urbanization. This system was tested using synthetic extreme rainfall scenarios, rather than typical rainfall, to ensure robust performance under severe weather conditions.

2. Materials and Methods

2.1. Modeling the Pumping Station

The Gasan Rainwater Pumping Station, located in Geumcheon-gu, Seoul, South Korea, was used as a simulated environment (Figure 1). The Gasan Pump Station, located in a flood-prone area, was selected for this study due to its suitability for analyzing flood management strategies. Table 2 shows the relationship between the elevation (height represented by meters above sea level) and storage capacity of the detention reservoir at the Gasan Pumping Station, obtained through actual measurements. The maximum water level of the pumping station is specified as 10 m. The relationship between storage volume, v, and water level, h, for other ranges can be derived using Equation (1) based on Table 2.

h = H_{f l o o r} (v) + \frac{H_{c e i l} (v) - H_{f l o o r} (v)}{V_{c e i l} (v) - V_{f l o o r} (v)} (v - V_{f l o o r} (v))

(1)

Here,

V_{c e i l} (v)

and

V_{f l o o r} (v)

represent the closest storage volumes in Table 2 that are, respectively, just above and just below a given storage volume v. Similarly,

H_{c e i l} (v)

and

H_{f l o o r} (v)

denote the corresponding water levels for

V_{c e i l} (v)

and

V_{f l o o r} (v)

.

The pumping station responsible for regulating the retention basin water level is equipped with three pumps with a capacity of 100 cubic meters per minute (m³/min) and two pumps with a capacity of 170 m³/min. Therefore, there are a total of 12 possible pump operation combinations. However, in actual practice, the Gasan pumping station operates using only the 6 pump combinations shown in Table 3, so only these six combinations were used in the simulation. For instance, the combination [0, 0, 0, 0, 0] represents the scenario where no pumps are operating, while [1, 1, 1, 1, 0] indicates that three 100 m³/min pumps and one 170 m³/min pump are in operation. These six pump combinations serve as the selectable actions for the model to respond to environmental conditions.

The current pump operation strategy at the Gasan pumping station is outlined in Table 4. When the retention basin’s water level reaches 6.2 m, one 100 m³/min pump is activated. For every additional 0.1 m increase in water level, additional pumps are sequentially activated until the water level reaches 6.6 m, at which point all five pumps are running. Conversely, when the water level drops below 5.9 m, pumps are sequentially deactivated in reverse order at 0.1 m intervals, stopping all pumps once the water level reaches 5.5 m.

To simulate the pump operations at the Gasan rainwater pumping station, we used the Storm Water Management Model (SWMM). The SWMM is a simulation system developed by the United States Environmental Protection Agency (EPA) to support the planning, analysis, and design of drainage systems related to flood control, wastewater treatment, and stormwater management [31]. By incorporating site-specific information, such as drainage infrastructure and rainfall data, the SWMM can predict flooding and pollutant loads over specified time periods. For the implementation of our simulation, we used PySWMM 1.2.0, a Python interface provided by the EPA that wraps SWMM5 [32].

2.2. Rainfall Data

Rainfall data for the experiments were generated using frequency-based probability rainfall, which is commonly used in rainwater pumping station design. Probability rainfall consists of rainfall amount, rainfall duration, and rainfall return period (frequency). To distribute the probability of rainfall over time, we used Huff’s dimensionless cumulative curve method [33]. This method, which leverages historical rainfall observations, is widely recognized for its reliability and is extensively used in practical applications, such as in river and small stream design standards [34]. Moreover, the efficient generation of rainfall patterns using this method can reduce simulation time and costs, while also providing realistic rainfall patterns by reflecting intensity variations in actual rainfall events. This approach is therefore also widely used in hydrology and civil engineering to assess structural safety under extreme rainfall conditions.

To reflect extreme rainfall patterns in the region, we used probabilistic maximum rainfall based on return periods and historical data from the area where the actual pumping station is located. A statistical optimization method, such as the Least Squares Method, is then applied to calculate weights suitable for four quantiles. These weights are adjusted to capture a range of rainfall events, enabling a realistic simulation of scenarios that the pump station may encounter. To address the importance of managing extreme rainfall events driven by climate change, rainfall samples were generated for 10-, 20-, 30-, 50-, 80-, and 100-year return periods. These samples were divided into nine rainfall durations: 60, 120, 180, 240, 360, 540, 720, 1080, and 1440 min. The estimated probability rainfall amounts for each duration in Seoul are shown in Table 5.

For each amount of rainfall, the distribution was spread across four quartiles of the Huff curve, generating four samples per amount and resulting in a total of 216 samples (6 × 9 × 4). Additionally, using a normal distribution with a mean set to each rainfall amount and a standard deviation of 0.5, 15 probabilistic rainfall values were generated for each scenario, resulting in a total of 3240 rainfall samples (216 × 15) through the same method. After filtering out problematic samples, such as those with negative rainfall values, a final set of 3200 rainfall samples was used as data. Equation (2) and Table 6 below provide the regression formula for probabilistic rainfall amounts in Seoul and the weighting factors applied to the four quartiles in the regression.

y = w_{1} x + w_{2} x^{2} + w_{3} x^{3} + w_{4} x^{4} + w_{5} x^{5} + w_{6} x^{6}

(2)

By running the SWMM model for the Gasan rainwater pumping station’s retention basin with the generated rainfall data, we can calculate the inflow rate into the retention basin. Figure 2 illustrates the results when the SWMM is executed using a rainfall data sample with a 30-year return period and a 60 min duration. Figure 2a shows the cumulative rainfall over time, while Figure 2b presents the corresponding inflow rate into the retention basin.

2.3. Problem Formulation

The problem addressed by the rainwater pumping station system can be defined as multi-objective pump combination optimization, aiming to minimize the maximum water level in the retention basin and reduce the number of pump switches.

[Multi-Objective Pump Combination Selection Problem]

Given a set of n pumps with different capacities, P = {p₁, p₂, p₃, …, p_n}, and a collection of k pump combinations, A = {a₁, a₂, a₃, …, a_k}, as well as an objective function ρ: A -> Q, find the optimal sequence of pump combinations over a given period T. This optimal sequence, π^T* = (a_i₁, a_i₂, a_i₃, …, a_iT), should maximize the objective function across unit time intervals to achieve optimal operation.

The objective function ρ, which we aim to achieve through pump combinations, consists of two goals:

(1) minimizing the maximum water level in the retention basin and (2) minimizing the number of pump switches.

Minimizing the maximum water level in the retention basin means finding a sequence of pump combinations that ensures that the highest water level observed across all unit times within the total simulation period T is as low as possible (Equation (3)).

a r g \min_{π_{i}} H (π_{i}), π_{i} \in π^{T}

(3)

H (π_{i}) = m a x h_{π_{i}} (t), 1 \leq t \leq T

Here,

h_{π_{i}} (t)

represents the water level in the detention basin at time t.

Minimizing pump switching is essential to prevent wear from frequent changes in the pump combinations selected at each time unit. The cost associated with these changes is referred to as the maintenance cost. Since defining the maintenance cost precisely can be challenging in practice, the number of pump on/off switches is used as a proxy [4]. The minimization of pump switching can be defined as shown in Equation (4).

a r g \min_{π_{i}} F (π_{i}), π_{i} \in π^{T}

(4)

F (π_{i}) = \sum_{t = 1}^{T} f_{π_{i}} (t), 1 \leq t \leq T

Here,

f_{π_{i}} (t)

represents the number of pump on/off switches at time t. Therefore, by combining the two optimization objectives with respective weights w₁ and w₂, the objective function can be expressed as shown in Equation (5). The ultimate goal is to find the optimal action sequence π^T* that satisfies Equation (5).

a r g \min_{π_{i}} O (π_{i}) = w_{1} \cdot {H (π_{i}) + w_{2} \cdot F (π_{i}) + w_{3} \cdot E (π_{i}), π}_{i} \in π^{T}

(5)

When pump capacities vary, the size of the solution space for possible pump combination sequences

π^{T}

over period T is

2^{n T}

. As the number of pumps n increases or the period T extends, the solution space grows exponentially, making it challenging to identify the optimal solution. This figure represents the total possible pump combinations over period T without considering environmental factors such as rainfall or retention basin water levels, meaning that the actual solution space is even larger when these conditions are accounted for.

2.4. Double Deep Q-Network for Pumping Systems

2.4.1. Reinforcement Learning

Reinforcement learning is a machine learning approach in which an agent observes the state of a given environment and selects actions to maximize cumulative rewards [35,36]. Reinforcement learning can be modeled as a Markov Decision Process (MDP), which is a discrete-time stochastic process represented by a 5-tuple (t, S, R, A, and Pr). Here, t denotes time, S is the set of states, R represents the reward, A is the set of actions, and Pr is the state transition probability.

The decision-making agent considers the current state s at time step t and selects a specific action a from the action set A. Based on the chosen action a and state s, the system transitions to a new state s′ at time step t + 1. This state transition follows the transition likelihood, or state transition probability, Pr(s, s′) = Pr(s_t₊₁ = s′|s_t = s). When the state changes, the agent receives a reward or penalty. In other words, the agent selects actions within the environment, thereby altering the state and obtaining corresponding rewards. The objective of reinforcement learning is for the agent to maximize cumulative rewards by interacting with the environment.

2.4.2. Double Deep Q-Network

The DQN is an important algorithm in reinforcement learning that combines Q-learning with deep learning [37]. The DQN allows agents to learn optimal actions in environments. The main idea behind the DQN is to use a deep learning model such as a Deep Neural Network (DNN) to approximate the Q-values, instead of using a table.

Q-learning is a fundamental reinforcement learning algorithm where an agent learns the value of taking a particular action in a given state. This value is called the Q-value, and it represents the expected cumulative future reward. In Q-learning, the agent updates the Q-values iteratively using the Bellman Equation.

Q (s, a) \leftarrow Q (s, a) + α (r + γ \max_{a^{'}} Q (s^{'}, a^{'}) m a x - Q (s, a))

(6)

Here, s and s′ indicate the current state and the next state, respectively. a is an action taken in the current state, r is the immediate reward after taking action a, γ is a discount factor for future rewards, and α is the learning rate. Q-learning updates Q-values based on the reward r and the maximum Q-value in the next state s′, leading the agent to learn a policy that maximizes the long-term reward. In the DQN, the deep learning model takes the current state as input and outputs Q-values for all possible actions. The goal is to train this network to approximate the Q-values accurately.

A DDQN is an improved version of the standard DQN used in RL [38]. It addresses a key limitation of DQNs, which is their tendency to overestimate action values, leading to instability and suboptimal policies during training. In a regular DQN, the agent learns to maximize cumulative reward by choosing actions based on a learned Q-function, which estimates the expected return (reward) for taking a certain action from a given state. The Q-function is approximated using a deep neural network, and the agent uses an epsilon-greedy policy to balance exploration and exploitation.

In contrast, during training, the DDQN updates its weights to minimize the difference between the predicted Q-values and the target Q-values, using the Bellman equation:

Q (s, a) = r + γ \max_{a} Q^{'} (s^{'}, a)

(7)

Here, Q′ refers to the target Q-network. In the DQN, the max operator

\max_{a} Q^{'} (s^{'}, a)

tends to overestimate the Q-values because the same network is used for both selecting and evaluating actions. This can result in unstable learning, especially in environments with noisy rewards. The DDQN mitigates this overestimation issue by decoupling action selection and evaluation. The action is selected based on the current Q-network.

a_{b e s t} = a r g \max_{a} Q (s^{'}, a; θ)

(8)

where θ is the parameters of the current Q-network. The Q-value for this action is estimated using the target network.

Q_{t a r g e t} (s, a) = r + γ Q^{'} (s^{'}, a_{b e s t}; θ^{'})

(9)

Here, θ′ is the parameters of the target network, which is a slowly updated copy of the current Q-network. This decoupling reduces the overestimation of Q-values and makes the training process more stable.

2.4.3. Gated Recurrent Unit

Bowes et al. [22] and Saliba et al. [23] aimed to improve decision-making model performance by using future rainfall predictions. However, rainfall forecasting requires separate prediction methods and inherently involves uncertainty. In contrast, this study leverages observable, sequential past states for both training and prediction, avoiding uncertain future predictions. Consequently, the decision-making agent must be a model capable of utilizing time-series data. In this study, we employ a GRU [39], a representative recurrent neural network, as the agent within the DDQN framework to determine optimal pump combinations suitable for the rainwater pumping station environment.

The Gated Recurrent Unit is a type of Recurrent Neural Network (RNN) architecture used in deep learning, particularly for sequence data. GRUs were introduced as a simpler alternative to Long Short-Term Memory (LSTM) networks [40], which are another popular type of RNN. GRUs were designed to solve the vanishing gradient problem that affects traditional RNNs when learning long-term dependencies. GRUs achieve this by using gating mechanisms to control the flow of information and maintain relevant information over long sequences.

A GRU contains two main gates, a reset gate and an update gate. The reset gate controls how much of the previous state is forgotten, whereas the update gate decides how much of the previous state needs to be passed to the next time step. For each time step t, the following operations occur.

The reset gate r_t is calculated using the previous hidden state h_t₋₁ and the current input x_t as follows.

r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}] + b_{r})

(10)

Here, W_r and b_r are the weights and biases, and σ is the sigmoid activation function. The update gate z_t is computed similarly to the reset gate, using the previous hidden state and the current input.

z_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}] + b_{z})

(11)

Here, W_z and b_z are the weights and biases for the update gate z. The reset gate is applied to the previous hidden state to create a candidate hidden state

{\tilde{h}}_{t}

. This candidate hidden state incorporates new information.

{\tilde{h}}_{t} = t a n h (W_{h} \cdot [r_{t} * h_{t - 1}, x_{t}] + b_{h})

(12)

Here, r_t∗ h_t−1 represents the element-wise multiplication of the reset gate and the previous hidden state, allowing the model to selectively ignore parts of the past. The update gate controls the final hidden state by blending the previous hidden state h_t₋₁ with the candidate hidden state

{\tilde{h}}_{t}

.

h_{t} = z_{t} * h_{t - 1} + (1 - z_{t}) * {\tilde{h}}_{t}

(13)

2.4.4. Model Configuration

The decision-making agent in the DDQN model consists of three GRU layers with 32 nodes each, followed by a fully connected output layer with six nodes. The six nodes in the output layer represent the six available pump combinations. We experimented with various values for the hyperparameters determining the structure of the DDQN and GRU models. As these changes did not result in significant performance differences, we opted to use the smallest possible model configurations. To ensure training stability, the replay method [41] was used, and an epsilon-greedy strategy was employed for action selection. Table 7 details the algorithms and parameters used during training. The chosen optimization algorithms and parameter values were determined through extensive experimentation to optimize model performance.

2.4.5. States

The state information used by the model to make action decisions in the pumping station environment is represented as a 5-dimensional vector comprising observed rainfall, retention basin inflow, basin water volume, basin water level, and basin outflow. This 5-dimensional vector serves as input data for the model’s training and inference processes. Simulated rainfall data were used for rainfall observations, and inflow values were obtained by running SWMM simulations at 2 min intervals. Likewise, the water volume, water level, and outflow values were derived at 2 min intervals based on the operation results of the selected pump combinations.

The observation values obtained from the evolving environment as the model runs are inherently sequential. To capture this temporal structure, a GRU—well-suited for time-series data—was selected as the agent in the DDQN model. The model can take δ consecutive observations as a single input vector; for instance, if three consecutive observations are used, an input would look like this: [[0.0155, 0.4861, 0.4861, 0.0000, 0.0123], [0.0073, 0.4934, 0.4934, 0.0000, 0.0059], [0.0045, 0.4959, 0.4959, 0.0000, 0.0037]]. In this case, the model would use a sequence of three 5-dimensional vectors, each representing a 2 min interval, effectively training and making predictions based on 6 min of sequential data.

2.4.6. Actions

The GRU agent within the DRL model selects specific actions from a set of possible actions to maximize rewards, based on state information acquired from the environment. As described in Section 2.1, the current Gasan Pumping Station manages water levels in the retention basin with three pumps of 100 m³/min capacity and two pumps of 170 m³/min capacity, utilizing only six specific pump combinations shown in Table 3. The proposed DRL model’s agent is also restricted to these six combinations to ensure a fair performance comparison with the existing rule-based operating method and to facilitate practical applicability in real-world operations.

2.4.7. Reward Function

In the DDQN, the reward function is a crucial component that provides feedback to the agent about how good or bad its actions are within the environment. It directly influences the learning process by guiding the agent toward achieving its goal.

To achieve the two objectives outlined in the problem definition, we developed two reward function components. First, to minimize the maximum water level, we define the water level reward r_e for a specific action at time t as shown in Equation (14).

r_{e} (t) = \frac{f_{out} (t)}{\sum_{i = 1}^{k} c_{i}} \cdot \frac{f_{l e v e l} (t)}{H L}

(14)

Here, f_out(t) and f_level(t) denote the outflow volume from the basin due to the selected action and the reservoir’s water level at time t, respectively. k represents the total number of pumps, and c_i indicates the pumping capacity of the i-th pump. HL is a constant representing the maximum height of the retention basin, which is 10 for the Gasan pumping station.

Next, the reward function r_p(t) for minimizing the number of pump switches at time t can be defined as shown in Equation (15).

r_{p} (t) = \frac{\sum_{i = 1}^{k} a n d (p_{i} (t - 1), p_{i} (t))}{k}

(15)

Here, p_i(t) represents the on or off state of pump i at time t, taking a value of 1 or 0. The and() function denotes the logical AND operation. The reward is higher when there are fewer changes in the on/off states of the pumps.

By multiplying each of these two reward functions by their respective weights and summing them, the total reward for a selected action a at a specific time t can be calculated as follows:

r (a, t) = w_{e} \cdot r_{e} (t) + w_{p} {\cdot r}_{p} (t)

(16)

Here, w_e, and w_p represent the weights for the two reward functions.

3. Results and Discussion

Of the total 3200 rainfall samples, approximately 80%, or 2552 samples, were used as training data, while the remaining 20%, or 648 samples, were used as test data. We used each of the 2552 rainfall scenarios in the training data only once, as in scenario-based learning, which enables continuous scenario generation. Consequently, we did not employ a method for determining the stopping point by using separate validation data, as is common when repeating the same data multiple times. The rainfall data, derived from five return period probabilities and nine durations, were distributed evenly between the training and test datasets. Consequently, the test dataset consists of 12 scenarios for each return period and duration. The experimental results presented below represent the average of these 12 test scenarios.

Figure 3 shows the simulation results when a test data sample from a 30-year return period with a 60 min rainfall duration scenario is used as input for the trained DDQN model. Here, the sequence length δ of the time-series input data is set to 3. Figure 3a illustrates the inflow volume entering the basin every 2 min and the outflow volume discharged based on the pump operation selected by the model. Figure 3b shows the changes in the basin water level at 2 min intervals for the same experiment.

Figure 4 shows the inflow and outflow of the basin when the same scenario as in Figure 3 is applied to the rule-based pump operation method currently used at the rainwater pumping station.

Test data for each of the six return period probabilities across nine durations were used to compare the maximum water level and pump switching frequency under three conditions: a DDQN model focused solely on minimizing the maximum water level (weight w_e = 1), a DDQN model considering both maximum water level minimization and pump switching minimization (w_e = 2, w_p = 1), and the rule-based model. As with the various parameters used in the DDQN and GRU models, the two weights w_e and w_p in the reward function were also tested extensively with a range of values. The best results were observed when w_e and w_p were set to 2 and 1, respectively.

Figure 5 presents the results of experiments across all rainfall durations for the 30-year return period probability. As shown in Figure 5a, both DDQN models achieved a lower maximum water level than the rule-based model. Notably, in cases of prolonged rainfall durations, such as 720 min or more, the DDQN models maintained a significantly lower maximum water level compared to shorter durations by consistently selecting pump combinations with maximum capacity. Additionally, there was no significant difference in maximum water level performance between the two DDQN models.

However, as shown in Figure 5b, for the number of pump switches, the DDQN model that incorporates switch minimization maintained a low number of changes, similar to the rule-based model, whereas the DDQN model without switch minimization exhibited a high frequency of pump switches. This demonstrates that incorporating switch minimization into the objective function allows for maintaining a low water level without significantly increasing the number of pump changes. Nonetheless, in cases of prolonged rainfall durations, the frequency of selecting high-capacity pump combinations increases, leading to a corresponding rise in pump switch frequency.

Table 8 shows the experimental results for maximum water levels across all scenarios for the three models. Similar to the results shown in Figure 5a for the 30-year return period, both DDQN models achieved lower maximum water levels compared to the rule-based model. Generally, as the return period and rainfall duration increase, the maximum water level tends to rise. However, it was also observed that when rainfall intensity is low relative to the return period or duration, the water level does not reach as high as it does in shorter, more intense rainfall events. Additionally, for the 100-year return period at 120, 180, and 240 min, the rule-based model experienced overflow in all 12 samples, exceeding the basin’s maximum water level, whereas the DDQN models showed almost no instances of overflow.

Table 9 presents the number of pump switches across all scenarios for the three models. As with the 30-year return period case, the DDQN model that considers both maximum water level minimization and pump switch minimization, as well as the rule-based model, shows fewer pump switches compared to the DDQN model focused solely on minimizing the maximum water level. Notably, for scenarios with high rainfall frequency and long durations, the DDQN model that considers both objectives maintains an even lower number of pump switches than the rule-based model, which is expected to reduce maintenance costs associated with pump wear.

As previously mentioned, the state information used to select pump combinations at the rainwater pumping station—such as rainfall, basin water level, and outflow volume—has a time-series nature. The length of the data sequence input into the DDQN model, which is designed to handle time-series data, plays a critical role in model performance. Therefore, we analyzed the impact of sequence length on model performance. We experimented with sequence lengths (number of consecutive states) from 1 to 5 to evaluate the effect on the performance of both DDQN models.

The sequence length represents the duration of observed data used as input by the model. For instance, a sequence length of 3 means that the model uses state information spanning three consecutive 2 min intervals (totaling 6 min) for training and inference. Figure 6a,c show changes in maximum water level and pump switch count for different sequence lengths when only maximum water level minimization (i.e., using only w_e) is considered in the DDQN model. Figure 6b,d illustrate the performance impact when both objectives, maximum water level minimization (w_e) and pump switch minimization (w_p), are considered.

The results indicate that the sequence length has little effect on the maximum water level for both models. However, its impact on the number of pump switches is less pronounced in the DDQN model that considers both objectives than in the model that considers only maximum water level minimization. Although there is no clear correlation between sequence length and the number of switches, longer sequences tend to help reduce pump switches more effectively than shorter sequences. For instance, the DDQN model with only w_e achieves the minimum switch count with a sequence length of 4, whereas the DDQN model using both w_e and w_p achieves its minimum with a sequence length of 5.

Table 10 presents the average results of experiments conducted across all scenarios using two different DDQN models and a rule-based algorithm. The DDQN model that incorporates both reward functions (w_e = 1, w_p = 2) shows an improvement of 14.2% in average maximum water level and 6.5% in average pump adjustments compared to the rule-based method.

For most rainfall scenarios, the DDQN (w_e = 1, w_p = 2) model demonstrated lower peak water levels. However, in scenarios with rainfall return periods of over 50 years and durations of 360 min or less, the rule-based method yielded better results in terms of the number of pump switches. In cases where large volumes of rainfall are introduced over short durations, the rapid changes in water levels and retention basin quantities likely prompted the DRL model to respond more precisely than the rule-based method, leading to relatively frequent pump combination adjustments. In all rainfall scenarios except these specific conditions, the DDQN model considering two rewards resulted in fewer pump switches. Further research on DRL control may be necessary to address this issue and enhance performance under these conditions.

An important observation from the experiments is that selecting appropriate hyperparameters, particularly the weights, remains a challenging task that requires extensive testing. Both the sequence length of state observations and the weights in the reward function significantly impact model effectiveness, underscoring the need for further research on optimal value selection methods. Additionally, setting a relatively low discount factor γ (0.4 in this case) for future rewards was beneficial in minimizing the maximum water level. This suggests that placing a higher value on immediate water level control rather than future levels is more effective for this objective, likely because this study relies only on observable past and present rainfall data. If predicted future rainfall data were incorporated, different discount factor values might yield better results. Further research in these areas is essential to enhance model performance and practical applicability.

Due to practical difficulties in accurately estimating maintenance costs, the number of pump switches was used in the reward function as a proxy for maintenance cost. However, to develop a model that can be effectively deployed in the field, it will be necessary to create a cost estimation model that more accurately reflects real maintenance costs. In addition to maintenance costs associated with pump switching, various other practical requirements must also be incorporated into the training model to meet real-world demands. For example, minimizing the power consumption required for pump operation and coordinating multiple pumping stations to manage flood risks are crucial factors. These requirements are expected to increase the complexity of the pump operation problem, likely necessitating novel solution approaches. However, even as the problem’s complexity increases, the nature of the DDQN allows for adjustments to the state and action spaces based on the specific environment and structure of a pump station. Additionally, hyperparameters, such as the number of layers and nodes in the model, can be modified to accommodate increased system complexity.

On the other hand, implementing a DRL-based operational method in the real world requires careful consideration of supporting infrastructure, such as sensors that can ensure the reliability of real-time input data, including accurate rainfall and retention basin inflow/outflow measurements. It is also essential to adjust the model to accommodate hardware limitations in the field, develop models closely aligned with field conditions, and conduct training and testing with actual data. Another area for further research is the development of multi-objective solutions for coordinating multiple pumping stations within the same region, each operating under different conditions, to jointly manage flood risk. Additionally, a comprehensive performance comparison of various DRL models applicable to pump station operations, as well as new training methods to enhance performance, would be valuable avenues for future work.

4. Conclusions

This study designed and implemented a model using a DDQN, a deep reinforcement learning approach, to automate pump operations at a rainwater pumping station in Seoul, South Korea. Rainfall data were generated using the Huff method, covering five return period probabilities and nine durations, and the SWMM was used to simulate the basin and pump system for training and testing the model with the generated synthetic rainfall. To validate the proposed model’s performance, we modeled and compared it to the rule-based pump operation approach currently used at the actual pumping station. The DDQN model that considers both reward functions (w_e = 1, w_p = 2) demonstrated improvements of 14.2% in average maximum water level and 6.5% in average pump changes compared to the rule-based method. To the best of our knowledge, this is the first study to apply deep reinforcement learning to the operation of pumps in a rainwater pumping station.

By using a GRU model as the agent, which is well-suited for time-series analysis on observable sequential state data, we confirmed that this approach enhances learning effectiveness and improves model performance. Additionally, designing a multi-objective reward function targeting both maximum water level minimization and pump switch minimization allowed the model to effectively prevent basin overflow without incurring additional maintenance costs associated with pump switching. Experiments showed that this approach outperforms rule-based operations in terms of flood prevention. However, to better reflect real-world conditions, it is necessary to develop a more sophisticated cost model that accounts for pump maintenance expenses beyond just maximum water levels. Incorporating this refined cost model into the reward function and analyzing its effects could provide valuable insights for developing intelligent pumps suitable for field applications.

Author Contributions

Conceptualization, J.-G.J. and S.-H.K.; methodology, S.-H.K.; software, S.-H.K. and I.-S.J.; writing—original draft preparation, J.-G.J.; writing—review and editing, S.-H.K.; visualization, I.-S.J.; supervision, S.-H.K.; funding acquisition, J.-G.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Korea Planning & Evaluation Institute of Industrial Technology funded by the Ministry of the Interior and Safety (MOIS, Korea). [Development and Application of Advanced Technologies for Urban Runoff Storage Capability to Reduce the Urban Flood Damage/RS-2024-00415937].

Data Availability Statement

The codes used in this paper are available under GitHub at: https://github.com/drminor-dsu/water_torch (available on 22 November 2024).

Acknowledgments

The authors would like to thank the editors and the anonymous reviewers for their insightful comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

NOAA National Centers for Environmental Information (NCEI). U.S. Billion-Dollar Weather and Climate Disasters. 2024. Available online: https://www.ncei.noaa.gov/access/billions/ (accessed on 22 November 2024).
Tabari, H. Climate change impact on flood and extreme precipitation increases with water availability. Sci. Rep. 2020, 10, 13768. [Google Scholar] [CrossRef] [PubMed]
Martel, J.L.; Brissette, F.P.; Lucas-Picher, P.; Troin, M.; Arsenault, R. Climate Change and Rainfall Intensity–Duration–Frequency Curves: Overview of Science and Guidelines for Adaptation. J. Hydrol. Eng. 2021, 26, 03121001. [Google Scholar] [CrossRef]
Zhuan, X.; Xiaohua, X. Optimal operation scheduling of a pumping station with multiple pumps. Appl. Energy 2013, 104, 250–257. [Google Scholar] [CrossRef]
Bachtiar, S.; Limantara, L.M.; Sholichin, M.; Seotopo, W. Optimization of Integrated Reservoir for Supporting the Raw Water Supply. Civ. Eng. J. 2023, 9, 860–872. [Google Scholar] [CrossRef]
Jafari, F.; Mousavi, S.J.; Yazdi, J.; Kim, J.H. Real-time operation of pumping systems for urban flood mitigation: Single-period vs. multi-period optimization. Water Resour. Manag. 2018, 32, 4643–4660. [Google Scholar] [CrossRef]
Mounce, S.R.; Shepherd, W.; Ostojin, S.; Abdel-Aal, M.; Schellart, A.N.A.; Shucksmith, J.D.; Tait, S.J. Optimisation of a fuzzy logic based local real-time control system for mitigation of sewer flooding using genetic algorithms. J. Hydroinform. 2020, 22, 281–295. [Google Scholar] [CrossRef]
Sadler, J.M.; Goodall, J.L.; Behl, M.; Bowes, B.D.; Morsy, M.M. Exploring real-time control of stormwater systems for mitigating flood risk due to sea level rise. J. Hydrol. 2020, 583, 124571. [Google Scholar] [CrossRef]
Sun, C.; Puig, V.; Cembrano, G. Real-Time Control of Urban Water Cycle under Cyber-Physical Systems Framework. Water 2020, 12, 406. [Google Scholar] [CrossRef]
Baumeister, T.; Brunton, S.L.; Kutz, J.N. Deep learning and model predictive control for self-tuning mode-locked lasers. J. Opt. Soc. Am. B 2018, 35, 617–626. [Google Scholar] [CrossRef]
Lee, X.Y.; Balu, A.; Stoecklein, D.; Ganapathysubramanian, B.; Sarkar, S. A Case Study of Deep Reinforcement Learning for Engineering Design: Application to Microfluidic Devices for Flow Sculpting. J. Mech. Des. 2019, 141, 111401. [Google Scholar] [CrossRef]
Dargan, S.; Kumar, M.; Ayyagari, M.R.; Kumar, G. A Survey of Deep Learning and Its Applications: A New Paradigm to Machine Learning. Arch. Comput. Methods Eng. 2020, 27, 1071–1092. [Google Scholar] [CrossRef]
Zhang, W.; Li, H.; Li, Y.; Liu, H.; Chen, Y.; Ding, X. Application of deep learning algorithms in geotechnical engineering: A short critical review. Artif. Intell. Rev. 2021, 54, 5633–5673. [Google Scholar] [CrossRef]
Mantach, S.; Lutfi, A.; Moradi Tavasani, H.; Ashraf, A.; El-Hag, A.; Kordi, B. Deep Learning in High Voltage Engineering: A Literature Review. Energies 2022, 15, 5005. [Google Scholar] [CrossRef]
Fu, G.; Jin, Y.; Sun, S.; Yuan, Z.; Butler, D. The role of deep learning in urban water management: A critical review. Water Res. 2022, 223, 118973. [Google Scholar] [CrossRef]
Wu, Z.; Zhou, Y.; Wang, H. Real-Time Prediction of the Water Accumulation Process of Urban Stormy Accumulation Points Based on Deep Learning. IEEE Access 2020, 8, 151938–151951. [Google Scholar] [CrossRef]
Wang, C.; Bowes, B.D.; Beling, A.; Goodall, J.L. Reinforcement Learning for Flooding Mitigation in Complex Stormwater Systems during Large Storms. In Proceedings of the 19th International Conference on Smart Technologies, Lviv, Ukraine, 6–8 July 2021; pp. 274–279. [Google Scholar] [CrossRef]
Tian, W.; Xin, K.; Zhang, Z.; Zhao, M.; Liao, Z.; Tao, T. Flooding mitigation through safe & trustworthy reinforcement learning. J. Hydrol. 2023, 620 Pt A, 129435. [Google Scholar] [CrossRef]
Li, X.; Liang, X.; Wang, X.; Wang, R.; Shu, L.; Xu, W. Deep reinforcement learning for optimal rescue path planning in uncertain and complex urban pluvial flood scenarios. Appl. Soft Comput. 2023, 144, 110543. [Google Scholar] [CrossRef]
Tian, W.; Fu, G.; Xin, K.; Zhang, Z.; Liao, Z. Improving the interpretability of deep reinforcement learning in urban drainage system operation. Water Res. 2024, 249, 120912. [Google Scholar] [CrossRef]
Mullapudi, A.; Lewis, M.J.; Gruden, C.L.; Kerkez, B. Deep reinforcement learning for the real time control of stormwater systems. Adv. Water Resour. 2020, 140, 103600. [Google Scholar] [CrossRef]
Bowes, B.D.; Tavakoli, A.; Wang, C.; Heydarian, A.; Behl, M.; Beling, P.A.; Goodall, J.L. Flood mitigation in coastal urban catchments using real-time stormwater infrastructure control and reinforcement learning. J. Hydroinform. 2021, 23, 529–547. [Google Scholar] [CrossRef]
Saliba, S.M.; Bowes, B.D.; Adams, S.; Beling, P.A.; Goodall, J.L. Deep Reinforcement Learning with Uncertain Data for Real-Time Stormwater System Control and Flood Mitigation. Water 2020, 12, 3222. [Google Scholar] [CrossRef]
Xu, W.; Meng, F.; Guo, W.; Li, X.; Fu, G. Deep Reinforcement Learning for Optimal Hydropower Reservoir Operation. J. Water Resour. Plan. Manag. 2021, 147, 04021045. [Google Scholar] [CrossRef]
Ismail, S.; Dawoud, D.W.; Ismail, N.; Marsh, R.; Alshami, A.S. IoT-Based Water Management Systems: Survey and Future Research Direction. IEEE Access 2022, 10, 35942–35952. [Google Scholar] [CrossRef]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M.A. Playing Atari with Deep Reinforcement Learning. arXiv 2013. [Google Scholar] [CrossRef]
Brunke, L.; Greeff, M.; Hall, A.W.; Yuan, Z.; Zhou, S.; Panerati, J.; Schoellig, A.P. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annu. Rev. Control Robot. Auton. Syst. 2022, 5, 411–444. [Google Scholar] [CrossRef]
Zhao, Y.; Wang, J.; Cao, G.; Yuan, Y.; Yao, X.; Qi, L. Intelligent Control of Multilegged Robot Smooth Motion: A Review. IEEE Access 2023, 11, 86645–86685. [Google Scholar] [CrossRef]
Li, Z.; Bai, L.; Tian, W.; Yan, H.; Hu, W.; Xin, K.; Tao, T. Online Control of the Raw Water System of a High-Sediment River Based on Deep Reinforcement Learning. Water 2023, 15, 1131. [Google Scholar] [CrossRef]
Tian, W.; Xin, K.; Zhang, Z.; Liao, Z.; Li, F. State Selection and Cost Estimation for Deep Reinforcement Learning-Based Real-Time Control of Urban Drainage System. Water 2023, 15, 1528. [Google Scholar] [CrossRef]
United States Environmental Protection Agency. Storm Water Management Model (SWMM). Available online: https://www.epa.gov/water-research/storm-water-management-model-swmm (accessed on 22 November 2024).
McDonnell, B.E.; Ratliff, K.; Tryby, M.E.; Wu, J.J.X.; Mullapudi, A. PySWMM: The Python Interface to Stormwater Management Model (SWMM). J. Open Source Softw. 2020, 5, 1–3. [Google Scholar] [CrossRef]
Huff, F.A. Time distribution of rainfall in heavy storms. Water Resour. Res. 1967, 3, 1007–1019. [Google Scholar] [CrossRef]
Lee, E.H.; Lee, Y.S.; Joo, J.G.; Jung, D.; Kim, J.H. Investigating the Impact of Proactive Pump Operation and Capacity Expansion on Urban Drainage System Resilience. J. Water Resour. Plan. Manag. 2017, 143, 04017024. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; The MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep Reinforcement Learning: A Brief Survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
Hasselt, H.V.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-Learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; pp. 2094–2100. [Google Scholar]
Cho, K.; van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014; pp. 1724–1734. [Google Scholar] [CrossRef]
Chung, J.; Gülçehre, Ç.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014. [Google Scholar] [CrossRef]
Liu, R.; Zou, J.Y. The Effects of Memory Replay in Reinforcement Learning. In Proceedings of the 56th Annual Allerton Conference on Communication, Control, and Computing, Allerton, Monticello, IL, USA, 2–5 October 2018; pp. 478–485. [Google Scholar] [CrossRef]

Figure 1. Gasan pumping station located in Korea.

Figure 2. (a) Cumulative rainfall over time for a 60 min duration sample generated with a 30-year return period. (b) Inflow variation into the detention basin over time, as obtained from the SWMM simulation of rainfall data.

Figure 3. Simulation result of the DDQN model using a 30-year, 60 min test sample as input: (a) inflow to the detention basin and (b) outflow from the detention basin at 2 min intervals.

Figure 4. Simulation result of the rule-based pump operation method using a 30-year, 60 min test sample as input: (a) inflow to the detention basin and (b) outflow from the detention basin at 2 min intervals.

Figure 5. (a) Maximum water level and (b) number of pump changes for three models based on rainfall data for all durations with a 30-year return period.

Figure 6. (a,b) The maximum water level for the DDQN model without the weight w_p for minimizing pump changes and for the DDQN model with the weight w_p, respectively, based on the input sequence size. (c,d) The number of pump changes in both models according to the input sequence size.

Table 1. Comparison of Advantages and Disadvantages of Various Models Applicable to Drainage Systems.

Models	Advantages	Disadvantages
Rule-Based Algorithms	- Simplicity and ease of implementation - Low computational demand - Predictable behavior - Reliability and robustness	- Lack of flexibility - Limited optimization - Inability to handle complexity - Limited use of data
Metaheuristics	- Global optimization - Applicability to complex problems - Multi-objective optimization	- High computational demand - Not suited for real-time control - Solution stability issues - Difficulty in interpretation
Model Predictive Control	- Predictive capability - Constraint handling - Optimized performance - Applicability to various systems	- High computational demand - Accurate model requirement - Implementation complexity - Real-time limitations
Deep Reinforcement Learning	- Handling of nonlinearities and complex interactions - Adaptability and learning capability - Real-time applicability - Generalization	- Requires large amounts of data - Long training times - Lack of interpretability

Table 2. Water volume for each height.

Height (m)	Water Volume (m³)
4.7	0.0
5.8	1000.0
6.3	2407.0
7.0	4452.0
8.0	8105.0
8.2	1008.0
10	16,000.0

Table 3. Available combinations for pump operation.

Pumping Operation Type	Representation
Action1: no pumping	[0, 0, 0, 0, 0]
Action 2: one 100 m³/min pump in operation	[1, 0, 0, 0, 0]
Action 3: two 100 m³/min pumps in operation	[1, 1, 0, 0, 0]
Action 4: three 100 m³/min pumps in operation	[1, 1, 1, 0, 0]
Action 5: four (100 m³/min: 3 + 170 m³/min: 1) pumps in operation	[1, 1, 1, 1, 0]
Action 6: all pumps operating	[1, 1, 1, 1, 1]

Table 4. Rule-based pumping operation at the Gasan pumping station.

When the Water Level Rises		When the Water Level Drops
Water Level	Pumping Operation Type	Water Level	Pumping Operation Type
6.2	Action 1	5.9	Action 4
6.3	Action 2	5.8	Action 3
6.4	Action 3	5.7	Action 2
6.5	Action 4	5.6	Action 1
6.6	Action 5	5.5	Action 0

Table 5. Duration-specific probability rainfall in Seoul Korea.

Frequency (Year)	Duration-Specific Probability Rainfall (mm)
Frequency (Year)	60 (min)	120 (min)	180 (min)	240 (min)	360 (min)	540 (min)	720 (min)	1080 (min)	1440 (min)
10	64.8	87.1	101.4	114.8	137.0	158.4	169.1	186.6	198.8
20	73.6	99.3	115.6	131.1	157.0	181.9	193.3	213.1	226.9
30	78.6	106.3	123.8	140.5	168.5	195.5	207.2	228.4	243.0
50	84.9	115.1	134.0	152.2	183.0	212.4	224.6	247.4	263.2
80	90.6	123.1	143.3	163.0	196.1	227.8	240.5	264.8	281.6
100	93.3	126.9	147.8	168.1	202.4	235.2	248.0	273.1	290.4

Table 6. Weights of four quantiles applied to the regression equation.

	$w_{1}$	$w_{2}$	$w_{3}$	$w_{4}$	$w_{5}$	$w_{6}$
First quartile	0.5462	0.1414	−0.005158	7.948 × 10⁻⁵	−5.774 × 10⁻⁷	1.615 × 10⁻⁹
Second quartile	0.4219	−0.03800	0.004340	−1.041 × 10⁻⁴	9.786 × 10⁻⁷	−3.269 × 10⁻⁹
Third quartile	−0.1844	0.08131	−0.004237	1.042 × 10⁻⁴	−1.082 × 10⁻⁶	3.941 × 10⁻⁹
Forth quartile	0.4736	−0.04096	0.002784	−6.970 × 10⁻⁵	7.689 × 10⁻⁷	−3.041 × 10⁻⁹

Table 7. The algorithm and parameters configured in the DDQN model.

Parameters	Values
Loss function	Mean squared error
Learning rate	1 × 10⁻³
Optimizer	Adam optimizer
Discount factor γ, used in RL	0.4
Epsilon ↋	Initial value 1.0, last value 0.2 Value decreases by 1/the number of current samples
Replay memory size	100
Batch size	20
Target network parameter update period	10 samples (episodes)
Training epochs	2552 (the number of training samples)

Table 8. The maximum water level of the three pump operation models for all scenarios.

Duration (min)	Model	Return Periods
Duration (min)	Model	10	20	30	50	80	100
60	DDQN (w_e = 1)	6.74	7.31	7.84	8.11	8.33	8.49
	DDQN (w_e = 1, w_p = 2)	6.85	7.45	7.81	8.10	8.30	8.52
	Rule base	7.57	8.05	8.34	8.77	9.27	9.37
120	DDQN (w_e = 1)	6.96	7.77	8.19	8.89	9.55	9.79
	DDQN (w_e = 1, w_p = 2)	7.08	7.83	8.16	8.82	9.43	9.81
	Rule base	8.06	8.78	9.08	9.57	9.95	10.00
180	DDQN (w_e = 1)	6.72	7.80	7.99	8.97	9.65	9.83
	DDQN (w_e = 1, w_p = 2)	6.47	7.73	8.15	8.83	9.66	9.75
	Rule base	7.57	8.51	9.04	9.68	9.90	10.00
240	DDQN (w_e = 1)	6.36	7.70	7.93	8.94	9.74	9.86
	DDQN (w_e = 1, w_p = 2)	6.28	7.46	8.01	8.95	9.43	9.83
	Rule base	7.29	8.22	9.07	9.49	9.90	10.00
360	DDQN (w_e = 1)	5.32	6.64	7.69	8.50	9.19	9.58
	DDQN (w_e = 1, w_p = 2)	5.27	6.74	7.57	8.40	9.35	9.17
	Rule base	6.73	7.56	8.38	9.36	9.86	9.94
540	DDQN (w_e = 1)	4.70	5.00	5.79	6.62	7.62	8.20
	DDQN (w_e = 1, w_p = 2)	4.70	4.91	5.51	6.84	7.67	8.18
	Rule base	6.59	6.81	7.14	7.43	8.32	8.55
720	DDQN (w_e = 1)	4.70	4.70	4.70	4.83	5.07	5.24
	DDQN (w_e = 1, w_p = 2)	4.70	4.70	4.70	4.96	4.83	5.52
	Rule base	6.51	6.56	6.61	6.62	6.67	6.76
1080	DDQN (w_e = 1)	4.70	4.70	4.70	4.70	4.70	4.70
	DDQN (w_e = 1, w_p = 2)	4.70	4.70	4.70	4.70	4.70	4.70
	Rule base	6.46	6.51	6.52	6.55	6.55	6.58
1440	DDQN (w_e = 1)	4.70	4.70	4.70	4.70	4.70	4.70
	DDQN (w_e = 1, w_p = 2)	4.70	4.70	4.70	4.70	4.70	4.70
	Rule base	6.41	6.41	6.42	6.51	6.51	6.52

Table 9. The number of pump changes of the three pump operation models for all scenarios.

Duration (min)	Model	Return Periods
Duration (min)	Model	10	20	30	50	80	100
60	DDQN (w_e = 1)	14.77	13.54	14.15	13.85	12.39	10.92
	DDQN (w_e = 1, w_p = 2)	7.15	7.15	7.92	8.39	8.54	7.77
	Rule base	9.92	9.92	9.39	7.77	5.31	5.69
120	DDQN (w_e = 1)	14.92	12.92	12.15	10.77	10.00	8.46
	DDQN (w_e = 1, w_p = 2)	8.69	8.77	8.85	9.62	8.69	8.39
	Rule base	9.92	8.08	9.08	8.85	6.15	6.92
180	DDQN (w_e = 1)	15.39	14.54	14.23	13.31	12.77	13.62
	DDQN (w_e = 1, w_p = 2)	8.69	8.23	8.54	9.15	9.00	9.15
	Rule base	10.00	9.31	9.15	8.85	8.08	8.08
240	DDQN (w_e = 1)	18.00	17.08	16.39	14.15	13.15	15.15
	DDQN (w_e = 1, w_p = 2)	8.85	9.00	8.54	8.54	8.39	10.54
	Rule base	10.46	10.31	8.46	9.23	8.77	8.31
360	DDQN (w_e = 1)	17.46	16.46	17.69	20.00	15.46	17.00
	DDQN (w_e = 1, w_p = 2)	9.31	9.00	9.62	9.54	9.46	8.69
	Rule base	10.31	10.23	10.54	8.54	8.46	7.31
540	DDQN (w_e = 1)	16.69	16.39	17.00	17.46	17.92	17.77
	DDQN (w_e = 1, w_p = 2)	10.85	10.08	9.85	10.39	10.85	11.00
	Rule base	10.77	12.31	11.23	11.62	10.77	12.62
720	DDQN (w_e = 1)	17.15	17.31	18.54	17.62	18.69	17.92
	DDQN (w_e = 1, w_p = 2)	11.85	11.23	11.69	11.39	10.54	10.46
	Rule base	12.00	10.54	11.62	12.92	13.39	13.23
1080	DDQN (w_e = 1)	17.31	17.31	18.08	17.15	18.85	17.46
	DDQN (w_e = 1, w_p = 2)	14.15	12.69	12.46	13.08	11.62	11.62
	Rule base	12.15	14.00	14.15	14.69	13.85	14.62
1440	DDQN (w_e = 1)	16.39	18.08	17.77	17.31	17.46	17.92
	DDQN (w_e = 1, w_p = 2)	13.69	13.46	14.92	13.85	13.77	14.15
	Rule base	18.77	17.23	14.69	14.69	17.69	19.00

Table 10. Comparison of average performance in terms of maximum water level and pump changes.

Models	Average Maximum Water Level (m)	Average Pump Changes
DDQN (w_e = 1)	6.85	15.78
DDQN (w_e = 1, w_p = 2)	6.83	10.22
Rule-based	7.96	10.93

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Joo, J.-G.; Jeong, I.-S.; Kang, S.-H. Deep Reinforcement Learning for Multi-Objective Real-Time Pump Operation in Rainwater Pumping Stations. Water 2024, 16, 3398. https://doi.org/10.3390/w16233398

AMA Style

Joo J-G, Jeong I-S, Kang S-H. Deep Reinforcement Learning for Multi-Objective Real-Time Pump Operation in Rainwater Pumping Stations. Water. 2024; 16(23):3398. https://doi.org/10.3390/w16233398

Chicago/Turabian Style

Joo, Jin-Gul, In-Seon Jeong, and Seung-Ho Kang. 2024. "Deep Reinforcement Learning for Multi-Objective Real-Time Pump Operation in Rainwater Pumping Stations" Water 16, no. 23: 3398. https://doi.org/10.3390/w16233398

APA Style

Joo, J.-G., Jeong, I.-S., & Kang, S.-H. (2024). Deep Reinforcement Learning for Multi-Objective Real-Time Pump Operation in Rainwater Pumping Stations. Water, 16(23), 3398. https://doi.org/10.3390/w16233398

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Reinforcement Learning for Multi-Objective Real-Time Pump Operation in Rainwater Pumping Stations

Abstract

1. Introduction

2. Materials and Methods

2.1. Modeling the Pumping Station

2.2. Rainfall Data

2.3. Problem Formulation

2.4. Double Deep Q-Network for Pumping Systems

2.4.1. Reinforcement Learning

2.4.2. Double Deep Q-Network

2.4.3. Gated Recurrent Unit

2.4.4. Model Configuration

2.4.5. States

2.4.6. Actions

2.4.7. Reward Function

3. Results and Discussion

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI