Real-Time Control of A2O Process in Wastewater Treatment Through Fast Deep Reinforcement Learning Based on Data-Driven Simulation Model

Hu, Fukang; Zhang, Xiaodong; Lu, Baohong; Lin, Yue

doi:10.3390/w16243710

Open AccessArticle

Real-Time Control of A2O Process in Wastewater Treatment Through Fast Deep Reinforcement Learning Based on Data-Driven Simulation Model

by

Fukang Hu

^1,*,

Xiaodong Zhang

²,

Baohong Lu

²

and

Yue Lin

²

¹

College of Civil Engineering, University of New South Wales, Sydney 2052, Australia

²

College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China

^*

Author to whom correspondence should be addressed.

Water 2024, 16(24), 3710; https://doi.org/10.3390/w16243710

Submission received: 16 November 2024 / Revised: 18 December 2024 / Accepted: 20 December 2024 / Published: 22 December 2024

(This article belongs to the Section Wastewater Treatment and Reuse)

Download

Browse Figures

Versions Notes

Abstract

Real-time control (RTC) can be applied to optimize the operation of the anaerobic–anoxic–oxic (A2O) process in wastewater treatment for energy saving. In recent years, many studies have utilized deep reinforcement learning (DRL) to construct a novel AI-based RTC system for optimizing the A2O process. However, existing DRL methods require the use of A2O process mechanistic models for training. Therefore they require specified data for the construction of mechanistic models, which is often difficult to achieve in many wastewater treatment plants (WWTPs) where data collection facilities are inadequate. Also, the DRL training is time-consuming because it needs multiple simulations of mechanistic model. To address these issues, this study designs a novel data-driven RTC method. The method first creates a simulation model for the A2O process using LSTM and an attention module (LSTM-ATT). This model can be established based on flexible data from the A2O process. The LSTM-ATT model is a simplified version of a large language model (LLM), which has much more powerful ability in analyzing time-sequence data than usual deep learning models, but with a small model architecture that avoids overfitting the A2O dynamic data. Based on this, a new DRL training framework is constructed, leveraging the rapid computational capabilities of LSTM-ATT to accelerate DRL training. The proposed method is applied to a WWTP in Western China. An LSTM-ATT simulation model is built and used to train a DRL RTC model for a reduction in aeration and qualified effluent. For the LSTM-ATT simulation, its mean squared error remains between 0.0039 and 0.0243, while its R-squared values are larger than 0.996. The control strategy provided by DQN effectively reduces the average DO setpoint values from 3.956 mg/L to 3.884 mg/L, with acceptable effluent. This study provides a pure data-driven RTC method for the A2O process in WWTPs based on DRL, which is effective in energy saving and consumption reduction. It also demonstrates that purely data-driven DRL can construct effective RTC methods for the A2O process, providing a decision-support method for management.

Keywords:

anaerobic–anoxic–oxic; real-time control; deep reinforcement learning; deep learning

1. Introduction

In China, the anaerobic–anoxic–oxic (A2O) process, based on the conventional activated sludge process, has become the primary technological method for most wastewater treatment plants, handling over 80% of wastewater [1,2]. The RTC of A2O systems is crucial in wastewater treatment and can be reflected in several aspects. RTC systems enhance wastewater treatment by adjusting gas and energy usage in real time to improve nitrogen and phosphorus removal, as well as to reduce consumption. They also maintain stability amid influent fluctuations, ensuring that the effluent quality meets standards. With IoT and big data advancements, A2O RTC systems can integrate more sensors and data tools, boosting efficiency and supporting decision-making [3].

RTC strategies vary from traditional controllers, e.g., proportional–integral (PI) and rule-based control, to more advanced techniques, e.g., model-based predictive control, fuzzy logic, and artificial neural networks [4]. Traditional RTC methods for the A2O process have several shortcomings that may affect the efficiency and effectiveness of wastewater treatment. They typically rely on experience and fixed operational parameters concerning periodic sampling and manual adjustments, leading to a slow response to real-time changes in influent water quality [4]. This can result in low control accuracy and an inability to achieve optimal treatment outcomes. Also, due to the lack of real-time monitoring and dynamic adjustments, traditional RTC methods cannot adapt to these complex changes and often result in high energy consumption. Currently, the average energy consumption for wastewater treatment in China has reached 0.29 kW-h/m³ [5,6], indicating significant potential for improvements.

A large number of studies have designed RTC systems for the A2O process using model predictive control (MPC) methods. These systems use optimization algorithms and predictive models to optimize the control strategy in real time to reduce energy consumption while ensuring effluent water quality, thereby allowing for the prediction of process variables and the provision of appropriate aeration levels, ensuring efficient and cost-effective operation compared to traditional methods [7,8,9]. Recent studies have also developed predictive models using more advanced deep learning methods and integrated them into MPC [2,7,10]. Data-driven methods for WWTP simulations include artificial neural network (ANN) models [11], fuzzy logic control (FLC) models [12], and principal component analysis (PCA) [13]. In the last two years, different machine learning models, including Support Vector Regression (SVR), deep neural networks (DNNs), and XGBoost (eXtreme Gradient-Boosting Decision Tree), have been used for the prediction of WWTPs [14]. Ensemble tree-based models were designed for predicting chemical oxygen demand (COD) and total nitrogen (TN) concentrations in WWTPs [15]. In addition to standard machine learning and deep learning models, a hybrid deep learning model was also developed for WWTP effluent prediction [16]. By leveraging the strong generalization ability of deep learning models and their adaptability to random factors, more accurate predictions are provided for MPC, enhancing its control effectiveness. However, existing MPC still requires continuous invocation of models and optimization algorithms to solve real-time optimization problems during operation. This necessitates both an accurate predictive model of the A2O dynamic process and rapid solving of real-time optimization problems, which may pose challenges for real-world implementation.

In recent years, DRL has also been applied to the RTC of the A2O process [17,18,19,20]. This method trains a neural network-based controller (called the agent) by applying it in a virtual environment (called the environment) to collect training data and using the collected data along with the corresponding training algorithms to upgrade the controller. In this way, the agent gradually learns how to control the relevant system, such as the A2O process. The trained agent only requires the current state variables of the A2O system as inputs to provide control actions during real-time applications. Therefore, it eliminates the need for extensive solving of the optimization model, as seen in MPC, making it more promising for RTC issues in the A2O process. As a solution technique for RTC and dynamic programming problems, this method has been applied in various fields, including autonomous driving [21], robotic control [22], game AI [23,24], and even in water resource management [25] and wastewater transport systems [26,27].

Current DRL training methods need a model of the A2O dynamic process, like BSM1 or ASM2d. These methods require many simulations to gather data from the control process. However, these mechanistic models require sufficient and specified data in terms of quantity and quality for model construction. This is often challenging in real-world conditions. Also, due to the time-consuming computing process of mechanistic models and the repeated calculations during DRL training, the training process of DRL can be computationally expensive, which severely hinders the practical application of DRL.

To address the above issues, this study aims to construct a novel DRL framework for the RTC of the A2O process using a data-driven predictive model based on deep learning. The relevant data collected from the A2O process can be used to design a relatively simplified dynamic model through the deep learning method. This allows us to find the dynamic relationships through any type of (rather than specified) data and form a simulation model [2,7,10]. The combination of the LSTM model with the attention framework [28], called LSTM-ATT, is used as the deep learning model. This model is a simplified version of a large language model (LLM) used to deal with time-sequence data [29,30]. This framework is utilized because of its powerful ability in analyzing time-sequence data and its less complex architecture than LLMs, which may overfit the A2O dynamic data. Moreover, once trained, the model only requires a single feedforward computation of the neural network to complete the simulation, significantly improving the computational speed. This addresses the shortcomings of the training speed of existing DRL in the RTC of the A2O process. Therefore, this method compensates for the limitations of current DRL approaches and facilitates better application in the optimization and scheduling of the A2O process.

2. Materials and Methods

2.1. Case Study

A wastewater treatment plant (WWTP) in Western China was used for the case study. This is a large-scale facility, covering an extensive area of 23 hectares. The plant serves a wide range, covering an area of 62 square kilometers, and providing wastewater treatment services for a population of 820,000. In terms of treatment capacity, the initial design capacity reached 400,000 cubic meters of wastewater per day, with the first and second phases having a treatment capacity of 300,000 tons per day and the third phase having a capacity of 100,000 tons per day. In terms of treatment processes, this wastewater treatment plant employs an advanced two-stage biological treatment process for phosphorus removal and nitrogen removal, achieving effluent quality that meets national Class I discharge standards. The main process in the plant utilizes the A2O treatment process, which effectively removes organic matter and nutrients such as nitrogen and phosphorus through the degradation action of microorganisms, ensuring the safety and stability of the effluent quality.

For the A2O process of the WWTP, a real-time data monitoring system was built to collect real-time data to assist in its management and operation. The data include the influent data, biological treatment data of the biochemical reaction pool, and effluent data. The influent data are collected at the location after the primary sedimentation tank and before biological treatment, while the effluent data are collected at the wastewater plant’s discharge outlet. The online monitored data include COD (chemical oxygen demand), NH₄⁺ (ammonia nitrogen), SS (suspended solids), TN (total nitrogen), TP (total phosphorus), mixed liquor suspended solids concentration (MLSS), internal recirculation flow rate (Qr), sludge internal recycle flow rate (Qsr), the setting values of dissolved oxygen (DO), water temperature (T), and water flow (flow). This data monitoring system was run for 1 year (from 1 January 2018 to 31 December 2018). The data from a 7-day period were utilized as the case data. The A2O layout and the locations of each sensor are provided in Figure 1. The statistical information of these A2O process data is given in Table 1. The locations of each sensor are presented in the figure below. Since the water quality data from the anaerobic, anoxic, and oxic tanks were not obtained, a mechanistic model cannot be built from the current data.

2.2. Dynamic Model Based on Deep Learning Method

Using the above data, a simulation model of the A2O process was constructed through deep learning methods, laying the foundation for subsequent testing. Considering the purely data-driven scenario and the limitations of the mechanistic models in terms of data requirements, this study only employed a data-driven model as the simulation environment, rather than using a mechanistic model.

This study builds upon previous research by integrating the LSTM model with the attention framework [28] to construct a novel A2O process simulation model called LSTM-ATT. The main goal of this model is to predict the quality of effluent in the future, using inflow data, DO control, and the current state of the biological reaction pool. Therefore, the model’s inputs consist of three components: inflow, DO control, and the state of the biological reaction tank, while the model’s output is the effluent quality at future timepoints.

Considering the data collection circumstances, a timestep of 1 h was set, meaning the model predicts the effluent quality one hour ahead. Given that biological reaction processes often last for one hydraulic retention time (HRT), the input data within previous HRTs were also considered as model inputs. This study considers an HRT of 10 h, referring to the operational standards of wastewater treatment plants. The input data are then put through an LSTM module for time-series feature analysis to further extract key features. Finally, the features are used to compute the final prediction results through the attention module. The model’s inputs and outputs, along with the overall framework, are illustrated in Figure 2. All inputs and outputs are normalized before entering the model, and the computed results are then renormalized. This study employs min–max normalization and denormalization, with the formulae shown below, where

{data}_{i}

is the i-th data term,

{minData}_{i}

and

{maxData}_{i}

are the historical minimum and maximum values of the i-th data term, respectively, and

{normalizedData}_{i}

represents the normalized data. This method transforms the data into variables ranging from −1 to 1, ensuring the stability of the model calculations.

{normalizedData}_{i} = 2 \frac{{data}_{i} - {minData}_{i}}{{maxData}_{i} - {minData}_{i}} - 1 {data}_{i} = {minData}_{i} + \frac{{normalizedData}_{i} + 1}{2} ({maxData}_{i} - {minData}_{i})

(1)

2.3. DRL Based on LSTM-ATT

Due to the existing DRLs requiring A2O mechanistic models as the environment for training, their computation is very time-consuming and requires specified data for model establishment. This results in numerous obstacles to the practical application of existing DRL. This paper uses LSTM-ATT as the environment to train DRL, leveraging its rapid simulation capabilities and independence from large amounts of specified data to address the shortcomings of traditional DRL. This study takes a value-based DRL, called deep Q neural network (DQN), as an example for research.

The standard DRL training framework consists of two parts: an agent that is used as a controller, and a virtual environment system being controlled by the agent. The agent can be used for RTC in practical problems after training. The training process includes two steps: First, the agent controls a virtual environment and collects data during the control process. These data include the real-time state

s_{t}

(e.g., inflow data, state of the biological reaction tank, etc.), the RTC action

a_{t}

(e.g., aeration amount, DO setpoint, etc.), and a manually assigned score for the current control effect, referred to as the reward

r_{t}

(e.g., whether the effluent meets standards, whether aeration is reduced, etc.). This step is called sampling. This paper selects the aforementioned state and influent as

s_{t}

for the DRL model, and the aforementioned control as

a_{t}

. The reward is set based on the criteria of meeting the effluent water quality standards and appropriately reducing the dissolved oxygen (DO), as shown in Equation (2), where

w_{1}

and

w_{2}

represent the weights of the two objectives, which are 0.6 and 0.4 in this study, respectively, while outflow and standard refer to the effluent water quality and the water quality standards, respectively. DO represents the controlled setpoint for DO. If the effluent meets the standards during the control process and the aeration is relatively low, the reward will be larger; conversely, if the standards are not met, the reward will be smaller.

r_{t} = \frac{w_{1}}{{‖ outflow - standard ‖}^{2}} + \frac{w_{1}}{{‖ DO ‖}^{2}}

(2)

Subsequently, the collected data are used to train the agent with the corresponding DRL algorithm. This study uses DQN, one of the DRL methods, as the research object, thus employing the algorithm corresponding to DQN for training. This step is referred to as upgrading. The processes of sampling and upgrading are repeated until the agent achieves satisfactory control performance, at which point the training is considered complete. The virtual environment for the sampling process is usually a model of the A2O dynamic process, which is time-consuming and needs to be designed for model establishment. This study adopts LSTM-ATT as the model for DRL training, which allows for the construction of DRL without specified data and enhances the training speed of DRL through its faster computation. The entire LSTM-ATT-based DQN framework is illustrated in Figure 3.

This study designed the LSTM-ATT DQN for the aforementioned case study. The DQN agent first estimates the Q value of the control problem through a deep neural network, which is defined as ‘the weighted sum of the rewards that may be obtained after taking a certain control action in the current state’, as shown in Equation (3), where

γ

is the discount factor.

Q (s_{t}, a_{t}) = r_{t} + γ \max_{a_{t - 1}} Q (s_{t - 1}, a_{t - 1})

(3)

Through training, the DQN agent is able to use a neural network to provide the Q values corresponding to different actions in real-time states and select the action with the highest Q value for control, thereby achieving process optimization. Since this process utilizes the trained agent, it can achieve better control performance compared to rule-based methods. The trained agent no longer needs to solve the optimization problem in real time, resulting in faster computation speeds, making it more suitable for RTC. Additionally, this study uses LSTM-ATT as the environment, allowing for rapid and effective training of DRL even with only partial data, addressing the shortcomings of traditional DRL.

2.4. Performance Index

This study designs evaluation metrics for the prediction model and the control model separately. The prediction model is primarily used to simulate the dynamic process of A2O; thus, the mean squared error (MSE) is used as the evaluation criterion. This is defined as the average of the squared differences between the predicted values and the actual values over N predictions, as shown in Equation (4), where

{prediction}_{i}

and

{observation}_{i}

represent the simulated value and the actual value of the i-th prediction and observation, respectively.

MSE = \frac{1}{N} \sum_{i = 1}^{N} {‖ {prediction}_{i} - {observation}^{i} ‖}^{2}

(4)

The control model is evaluated from two aspects: on the one hand, by the control effect, which refers to whether the effluent meets the standards; on the other hand, by the DO content, which assesses whether aeration can be minimized while ensuring that the effluent quality meets the standards, meaning that the DO value is relatively low. Therefore, the control effect is measured by the average of the reward function (MR), which is the average of all rewards during a control process with a timestep of T, as shown in Equation (5).

MR = \frac{1}{T} \sum_{t = 1}^{N} {reward}_{i}

(5)

3. Results and Discussion

3.1. Performance of LSTM-ATT

From the above data, 70% of them were randomly selected as the training set, while the remaining data were used as the validation set to train the LSTM-ATT model. The results are shown in Figure 4. It can be observed that the MSE during the training process gradually decreases, and the MSEs on both the training set and the validation set decrease synchronously, indicating that the training is effective. At the same time, the error curves on the training set and the test set during the model training process are highly consistent, which means that the A2O data distributions of these two sub-datasets are relatively consistent and balanced. This may be different for other A2O systems. Data imbalance is a problem that any deep learning model may encounter and can impact the model performance. Due to the limited data obtained in this study, this is impossible to explore in depth here. Further analysis is needed after obtaining more data with distribution differences in future studies.

The trained LSTM-ATT model was used to simulate the entire time period, and the simulation results were compared with the measured data to calculate the mean squared error (MSE) and R-squared of the simulation. The results in Figure 5 indicate that the trained model has good predictive performance, with the MSE of each term prediction in the range from 0.0039 to 0.0243, which can provide support for subsequent control issues. Although the simulation results are accurate, it can be seen that the MSE and R-squared of the model for the data items are still different. This is due to the diversity of the dynamic data patterns and distribution [31,32,33]. For example, TP data were relatively stable in this study, so the MSE and R-squared of their prediction results are better than those of other items. Therefore, collecting more comprehensive A2O data can help the model understand system changes, thereby avoiding the impact of data imbalance on the model to a certain extent.

3.2. Training Curve and Time of the LSTM-ATT-Based DQN

According to the above method of training the DQN, the training process uses LSTM-ATT as the environment, and the total sum of all rewards during the training process increases with the number of iterations, as shown in Figure 6. It can be seen that the DQN agent gradually learned how to set the DO values in the A2O process in real time, to ensure (1) compliance with effluent standards and (2) that the DO value is kept as low as possible to reduce aeration. Furthermore, the entire training process was conducted on an Intel i7-9750H computing platform, and it only took 726.98 s for 100 iterations, which means that the simulation of a single timestep took only 7.26 s. This computation time is significantly reduced compared to the existing A2O mechanistic dynamic models. Due to the incomplete data in this study, it was not possible to establish an A2O mechanistic model for comparison. This also highlights the advantage of the method proposed in this research, which is that the purely data-driven DRL can be used for the optimization and RTC of the A2O process. This improves the training speed while eliminating the need for specified data for the construction of RTC models.

3.3. Control Effect of DRLs

In this study, a 4-day case was selected within the time range of the above data to test the aforementioned control method. The measured effluent data were compared with the control results. The mean reward of the control process is also given in the figure below. It can be seen that the effluent during the DQN control process is generally consistent with that of the actual control process and meets the effluent standards of the wastewater treatment plant. Furthermore, the DO setpoint values are lower than those in the actual control, indicating that the DQN trained using the above method can reduce DO to a certain extent while ensuring the effectiveness of the control. This allows for a reduction in aeration. It also suggests that there is still room for improvement in the existing A2O processes of wastewater treatment plants. Such improvements are often difficult to uncover through manual experience [34]; therefore, the methods and ideas provided in this study are worth referencing for enhancing the operational control and management of wastewater treatment plants.

In addition, the results also reveal that the changes in effluent indicators controlled by the DQN show a certain degree of improvement compared to the actual situation, indicating that the DQN can partially address the lag issues faced by rule-based control methods. At the same time, the reward curve in Figure 7 also indicates that the control process of the DQN is not always perfect. For example, in this case, the control was relatively high in the early stages but gradually decreased later. This suggests that the A2O process has various states at different times and, thus, the degree of optimization can vary in different periods. RTC needs to identify the periods that can be optimized as much as possible and improve control effects during those periods to ensure good overall performance.

4. Conclusions

The A2O process requires refined control; however, existing control methods have certain shortcomings in terms of lacking adaptability to varying operational conditions, the requirement of specified data for model construction, and difficulty in optimizing control parameters in real time. This study leveraged deep learning and DRL methods to design a novel, pure data-driven DRL RTC system that addresses these deficiencies. The proposed method was applied to a wastewater treatment plant in Western China. Measured data were collected to train a new A2O process simulation and prediction model called LSTM-ATT. A DRL model was then trained for RTC based on the aforementioned simulation model and tested to compare with the measured data. The results indicate that the simulation model achieved accurate simulation over 4 days. The mean squared error (MSE) of various predictions was maintained between 0.0039 and 0.0243, with R-squared values larger than 0.996. The DQN, while ensuring stable effluent, achieved an average DO setpoint value of 3.8841 mg/L, which is lower than the DO setpoint value of existing manual control (3.9563 mg/L), resulting in a reduction in aeration and energy savings. Furthermore, the methods provided in this study offer an effective, pure data-driven technique for scheduling and analyzing the operation in the A2O process of wastewater treatment plants, supporting the intelligent and low-carbon transformation of wastewater treatment facilities. The method shows promising results in wastewater treatment applications. It is a pure data-driven RTC method for the A2O process in WWTPs based on DRL, which is effective in energy saving and consumption reduction. However, it is highly dependent on the quality of the data. In cases where the data quality is poor, the effectiveness of the application may be limited. For instance, sensor malfunctions, missing data, or noisy data can adversely affect the model’s training and prediction accuracy. Therefore, ensuring the accuracy and completeness of data is crucial for practical applications. Future research will consider the feasibility and effectiveness of this method in real-world practical applications. It will specifically explore method implementation in different types of WWTPs and assess adaptability under various operational conditions. Additionally, improving data quality, such as by introducing data cleaning and preprocessing techniques or enhancing the reliability of sensor networks, should be considered to improve the data collection process. Through these efforts, it is expected that the practicality of this method can be further enhanced, promoting the intelligent and sustainable development of the wastewater treatment industry.

Author Contributions

Conceptualization, F.H. and Y.L.; methodology, F.H.; software, F.H.; validation, F.H., Y.L. and B.L.; formal analysis, F.H.; investigation, F.H.; resources, F.H.; data curation, F.H.; writing—original draft preparation, F.H.; writing—review and editing, F.H.; visualization, F.H.; supervision, X.Z.; project administration, X.Z.; funding acquisition, F.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Scientific Research Projects of the TB Hydropower Station under the China Huaneng Group’s Science and Technology Project (Grant no. HNKJ22-H87). Additionally, this study was supported by the Environmental Water Body Monitoring and Assessment Project of the Jinsha River Xulong Hydropower Station Impact Area in the Construction Period (Grant no. XL-FW-2023-012), National Energy Group Jinsha River Xulong Hydropower Co., Ltd.

Data Availability Statement

All of the code and data can be found on Gitee. https://github.com/fukanghu/Data-driven-simulation-model. Accessed on 15 December 2024.

Acknowledgments

The authors would like to thank Chenglin LI and Zhixuan Cheng for providing the necessary data and resources for this study.

Conflicts of Interest

The authors declare that this study received funding from National Energy Group Jinsha River Xulong Hydropower Co., Ltd. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

References

Dai, W.; Xu, X.; Liu, B.; Yang, F. Toward energy-neutral wastewater treatment: A membrane combined process of anaerobic digestion and nitritation—Anammox for biogas recovery and nitrogen removal. Chem. Eng. J. 2015, 279, 725–734. [Google Scholar] [CrossRef]
Liu, Y.; Tian, W.; Xie, J.; Huang, W.; Xin, K. LSTM-Based Model-Predictive Control with Rationality Verification for Bioreactors in Wastewater Treatment. Water 2023, 15, 1779. [Google Scholar] [CrossRef]
Zanetti, L.; Frison, N.; Nota, E.; Tomizioli, M.; Bolzonella, D.; Fatone, F. Progress in real-time control applied to biological nitrogen removal from wastewater. A short-review. Desalination 2012, 286, 1–7. [Google Scholar] [CrossRef]
Sheik, A.G.; Tejaswini, E.S.S.; Seepana, M.M.; Ambati, S.R. Control of anaerobic-anoxic-aerobic (A2/O) processes in wastewater treatment: A detailed review. Environ. Technol. Rev. 2023, 12, 420–440. [Google Scholar] [CrossRef]
Yan, P.; Qin, R.C.; Guo, J.S.; Yu, Q.; Li, Z.; Chen, Y.-P.; Shen, Y.; Fang, F. Net-zero-energy model for sustainable wastewater treatment. Environ. Sci. Technol. 2017, 51, 1017–1023. [Google Scholar] [CrossRef] [PubMed]
Li, W.; Li, L.; Qiu, G. Energy consumption and economic cost of typical wastewater treatment systems in Shenzhen, China. J. Clean. Prod. 2017, 163, S374–S378. [Google Scholar] [CrossRef]
Bernardelli, A.; Marsili-Libelli, S.; Manzini, A.; Stancari, S.; Tardini, G.; Montanari, D.; Anceschi, G.; Gelli, P.; Venier, S. Real-time model predictive control of a wastewater treatment plant based on machine learning. Water Sci. Technol. 2020, 81, 2391–2400. [Google Scholar] [CrossRef]
Gurung, K.; Tang, W.; Sillanpää, M. Unit energy consumption as benchmark to select energy positive retrofitting strategies for Finnish Wastewater Treatment Plants (WWTPs): A Case Study of Mikkeli WWTP. Environ. Process. 2018, 5, 667–681. [Google Scholar] [CrossRef]
Du, S.; Zhang, Q.; Cao, B.; Qiao, J. A Review of Model Predictive Control for Urban Wastewater Treatment Process. Inf. Control 2022, 51, 41–53. [Google Scholar]
Pisa, I.; Santin, I.; Morell, A.; Vicario, J.L.; Vilanova, R. LSTM-Based Wastewater Treatment Plants Operation Strategies for Effluent Quality Improvement. IEEE Access 2019, 7, 159773–159786. [Google Scholar] [CrossRef]
Nasr, M.; Moustafa, M.; Seif, H.; Kobrosy, G. Application of Artificial Neural Network (ANN) for the prediction of EL-AGAMY wastewater treatment plant performance-EGYPT. Alex. Eng. J. 2012, 51, 37–43. [Google Scholar] [CrossRef]
Fiter, M.; Güell, D.; Comas, J.; Colprim, J.; Poch, M.; Rodríguez-Roda, I. Energy Saving in a Wastewater Treatment Process: An Application of Fuzzy Logic Control. Environ. Technol. 2005, 26, 1263–1270. [Google Scholar] [CrossRef] [PubMed]
Wang, X.; Ratnaweera, H.; Holm, J.; Olsbu, V. Statistical monitoring and dynamic simulation of a wastewater treatment plant: A combined approach to achieve model predictive control. J. Environ. Manag. 2017, 193, 1–7. [Google Scholar] [CrossRef] [PubMed]
Ye, G.; Wan, J.; Deng, Z.; Wang, Y.; Zhu, B.; Yan, Z.; Ji, S. Machine learning-based prediction of biological oxygen demand and unit electricity consumption in different-scale wastewater treatment plants. J. Environ. Chem. Eng. 2024, 12, 111849. [Google Scholar] [CrossRef]
Cheng, Q.; Kim, J.-Y.; Wang, Y.; Ren, X.; Guo, Y.; Park, J.-H.; Park, S.-G.; Lee, S.-Y.; Zheng, G.; Wang, Y.; et al. Novel Ensemble Learning Approach for Predicting COD and TN: Model Development and Implementation. Water 2024, 16, 1561. [Google Scholar] [CrossRef]
Xie, Y.; Chen, Y.; Wei, Q.; Yin, H. A hybrid deep learning approach to improve real-time effluent quality prediction in wastewater treatment plant. Water Res. 2024, 250, 121092. [Google Scholar] [CrossRef]
Hernandez-del Olmo, F.; Llanes, F.H.; Gaudioso, E. An emergent approach for the control of wastewater treatment plants by means of reinforcement learning techniques. Expert Syst. Appl. 2012, 39, 2355–2360. [Google Scholar] [CrossRef]
Pang, J.; Yang, S.; He, L.; Chen, Y.; Ren, N. Intelligent control/operational strategies in wwtps through an integrated q-learning algorithm with asm2d-guided reward. Water 2019, 11, 927. [Google Scholar] [CrossRef]
Chen, K.; Wang, H.; Valverde-Pérez, B.; Zhai, S.; Vezzaro, L.; Wang, A. Optimal Control towards Sustainable Wastewater Treatment Plants Based on Multi-Agent Reinforcement Learning. Chemosphere 2021, 279, 130498. [Google Scholar] [CrossRef]
Yang, Q.; Cao, W.; Meng, W.; Si, J. Reinforcement-Learning-Based Tracking Control of Waste Water Treatment Process Under Realistic System Conditions and Control Performance Requirements. IEEE Trans. Syst. Man. Cybern. Syst. 2022, 52, 5284–5294. [Google Scholar] [CrossRef]
Kiran, B.R.; Sobh, I.; Talpaert, V.; Mannion, P.; Sallab, A.A.A.; Yogamani, S.; Pérez, P. Deep Reinforcement Learning for Autonomous Driving: A Survey. IEEE Trans. Intell. Transp. Syst. 2022, 23, 4909–4926. [Google Scholar] [CrossRef]
Sangiovanni, B.; Rendiniello, A.; Incremona, G.P.; Ferrara, A.; Piastra, M. Deep Reinforcement Learning for Collision Avoidance of Robotic Manipulators. In Proceedings of the 2018 European Control Conference (ECC), Limassol, Cyprus, 12–15 June 2018; pp. 2063–2068. [Google Scholar]
Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
Castelletti, A.; Pianosi, F.; Restelli, M. A multiobjective reinforcement learning approach to water resources systems operation: Pareto frontier approximation in a single run. Water Resour. Res. 2013, 49, 3476–3486. [Google Scholar] [CrossRef]
Tian, W.; Xin, K.; Zhang, Z.; Liao, Z.; Li, F. State Selection and Cost Estimation for Deep Reinforcement Learning-Based Real-Time Control of Urban Drainage System. Water 2023, 15, 1528. [Google Scholar] [CrossRef]
Tian, W.; Fu, G.; Xin, K.; Zhang, Z.; Liao, Z. Improving the interpretability of deep reinforcement learning in urban drainage system operation. Water Res. 2024, 249, 120912. [Google Scholar] [CrossRef] [PubMed]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 6000–6010. [Google Scholar]
Sun, C.; Qiu, X.; Xu, Y.; Huang, X. How to Fine-Tune Bert for Text Classification? In Proceedings of the Chinese Com-putational Linguistics: 18th China National Conference, CCL 2019, Kunming, China, 18–20 October 2019; pp. 194–206. [Google Scholar]
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 5485–5551. [Google Scholar]
Tian, W.; Liao, Z.; Wang, X. Transfer learning for neural network model in chlorophyll-a dynamics prediction. Environ. Sci. Pollut. Res. 2019, 26, 29857–29871. [Google Scholar] [CrossRef] [PubMed]
Li, R.; Feng, K.; An, T.; Cheng, P.; Wei, L.; Zhao, Z.; Xu, X.; Zhu, L. Enhanced Insights into Effluent Prediction in Wastewater Treatment Plants: Comprehensive Deep Learning Model Explanation Based on SHAP. ACS ES&T Water 2024, 4, 1904–1915. [Google Scholar]
Yin, W.-X.; Wang, Y.-Q.; Lv, J.-Q.; Chen, J.-J.; Liu, S.; Pang, Z.; Yuan, Y.; Bao, H.-X.; Wang, H.-C.; Wang, A.-J. A Machine Learning Framework for Enhanced Assessment of Sewer System Operation under Data Constraints and Skewed Distributions. ACS&EST Eng. 2024, acsestengg.4c00477. [Google Scholar] [CrossRef]
Duarte, M.S.; Martins, G.; Oliveira, P.; Fernandes, B.; Ferreira, E.C.; Alves, M.M.; Lopes, F.; Pereira, M.A.; Novais, P. A Review of Computational Modeling in Wastewater Treatment Processes. ACS ES&T Water 2024, 4, 784–804. [Google Scholar]

Figure 1. A2O layout and the locations of each sensor.

Figure 2. The architecture of LSTM-ATT.

Figure 3. The framework of the LSTM-ATT-based DQN.

Figure 4. Training process of LSTM-ATT.

Figure 5. The simulation of LSTM-ATT.

Figure 6. The training process of the LSTM-ATT-based DQN.

Figure 7. The outflow of 4-day control processes of DQN methods compared with observation (left); the DO setpoint values given by the DQN compared with true data (right); reward curve (last one on the right side).

Table 1. Statistical information of A2O process data.

Time Span	Type	Data and Unit	Maximum	Minimum	Average	Median
7 days	State	Aerobic MLSS (g/L)	8700.00	7644.00	8206.02	8261.06
		Anaerobic MLSS (g/L)	8696.00	7071.00	7823.57	7836.50
		Anoxic MLSS (g/L)	7348.00	6003.00	6525.47	6444.41
		Qr (10³ m³/s)	424.67	380.75	395.34	393.41
		Qsr (10³ m³/s)	157.29	141.02	146.42	145.71
	Control	Setting value of DO1 (mg/L)	4.13	3.03	3.73	3.79
		Setting value of DO2 (mg/L)	4.33	3.70	3.99	3.97
		Setting value of DO3 (mg/L)	4.13	3.77	3.95	3.95
	Influent/ inflow	Flow (10³ m³/s)	449.39	402.91	418.35	416.31
		COD (mg/L)	933.00	381.00	590.25	566.00
		SS (mg/L)	800.00	183.00	414.91	373.00
		TP (mg/L)	13.91	4.97	8.61	8.19
		TN (mg/L)	68.70	33.20	54.05	55.10
		NH₄⁺ (mg/L)	40.80	21.20	29.76	29.40
		T (°C)	21.00	18.00	19.31	19.00
	Effluent/ outflow	Flow (10³ m³/s)	445.12	398.50	413.71	411.52
		COD (mg/L)	33.00	13.00	21.19	21.00
		TP (mg/L)	0.06	0.04	0.05	0.05
		TN (mg/L)	8.56	5.57	6.88	6.86
		NH₄⁺ (mg/L)	0.14	0.07	0.11	0.12

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, F.; Zhang, X.; Lu, B.; Lin, Y. Real-Time Control of A2O Process in Wastewater Treatment Through Fast Deep Reinforcement Learning Based on Data-Driven Simulation Model. Water 2024, 16, 3710. https://doi.org/10.3390/w16243710

AMA Style

Hu F, Zhang X, Lu B, Lin Y. Real-Time Control of A2O Process in Wastewater Treatment Through Fast Deep Reinforcement Learning Based on Data-Driven Simulation Model. Water. 2024; 16(24):3710. https://doi.org/10.3390/w16243710

Chicago/Turabian Style

Hu, Fukang, Xiaodong Zhang, Baohong Lu, and Yue Lin. 2024. "Real-Time Control of A2O Process in Wastewater Treatment Through Fast Deep Reinforcement Learning Based on Data-Driven Simulation Model" Water 16, no. 24: 3710. https://doi.org/10.3390/w16243710

APA Style

Hu, F., Zhang, X., Lu, B., & Lin, Y. (2024). Real-Time Control of A2O Process in Wastewater Treatment Through Fast Deep Reinforcement Learning Based on Data-Driven Simulation Model. Water, 16(24), 3710. https://doi.org/10.3390/w16243710

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Real-Time Control of A2O Process in Wastewater Treatment Through Fast Deep Reinforcement Learning Based on Data-Driven Simulation Model

Abstract

1. Introduction

2. Materials and Methods

2.1. Case Study

2.2. Dynamic Model Based on Deep Learning Method

2.3. DRL Based on LSTM-ATT

2.4. Performance Index

3. Results and Discussion

3.1. Performance of LSTM-ATT

3.2. Training Curve and Time of the LSTM-ATT-Based DQN

3.3. Control Effect of DRLs

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI