Next Article in Journal
The Impact of Tributary Mineralization on Deep-Water Renewal in Lake Baikal During the Thermal Bar
Previous Article in Journal
Assessment of Acute Toxicity of Acid Mine Drainage via Toxicity Identification Evaluation (TIE) Using Daphnia magna and Chlorella vulgaris
Previous Article in Special Issue
Removal of Nitrogen and Phosphorus from Municipal Wastewater Through Cultivation of Microalgae Chlorella sp. in Consortium
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Worldwide Research Progress and Trends in Application of Machine Learning to Wastewater Treatment: A Bibliometric Analysis

1
Shanghai Municipal Engineering Design Institute (Group) Co., Ltd., 901 North Zhongshan Road (2nd), Shanghai 200092, China
2
State Key Laboratory of Pollution Control and Resource Reuse, College of Environmental Science and Engineering, Tongji University, 1239 Siping Road, Shanghai 200092, China
*
Author to whom correspondence should be addressed.
Water 2025, 17(9), 1314; https://doi.org/10.3390/w17091314
Submission received: 18 March 2025 / Revised: 26 April 2025 / Accepted: 27 April 2025 / Published: 28 April 2025
(This article belongs to the Special Issue Advanced Biological Wastewater Treatment and Nutrient Removal)

Abstract

:
Efficient wastewater treatment with high-quality effluent and minimal operational costs and carbon emissions is vital for safeguarding the ecological environment and promoting human health. However, the wastewater treatment process is extremely complicated due to the characteristics of multiple treatment mechanisms, high disturbance variability and nonlinear behaviors; therefore, optimizing the wastewater treatment process through intelligent control is a long-standing challenge for researchers and operators. Machine learning models are regarded as effective tools for wastewater treatment with better simulating and controlling complex nonlinear behaviors. With the aid of bibliometric analysis, this paper aimed to summarize worldwide research progress and trends in the application of machine learning to wastewater treatment among 1226 related publications. The findings indicate that China and the United States are the two leading countries, with publications of 342 and 209, respectively, while the United States is an outstanding global collaboration leader in this field. Research institutions and authors are mainly from developing countries, and China accounts for the largest proportion of these. The analysis of journal and cited journal contributions report that almost all of the top 10 journals in publications belong to the Q1 quartile (9/10). Overall, future research will likely focus on developing systematic, strong and multi-objective models for wastewater treatment. A hybrid model could take advantage of two or more machine learning models or mechanistic models, which have been verified as excellent models for tackling limited data. Thus, predicting the pollutants in the effluent rather than the influent using hybrid models is attracting increasing attention because effective prediction contributes to reducing the loading shock of influent sharp fluctuation to wastewater treatment effluent quality. Also, the development of advanced data acquirement devices and the AI model prediction with partially default data should also be another focus of future research.

1. Introduction

Wastewater is a growing health and environmental threat due to the continuously increasing population and rapid urbanization [1]. The global population is expected to exceed nine billion people by 2050. Major growth will take place in developing countries, particularly in urban areas that already have inadequate wastewater infrastructure. Therefore, the financial, environmental and social costs are projected to increase dramatically unless wastewater treatment receives refined management [2]. For example, as the largest developing country in the world, China has made significant progress in enhancing its wastewater treatment capacity, resulting in an effective reduction in pollutant emissions; by the end of 2021, more than 4500 wastewater treatment plants (WWTPs) have been built, and wastewater treatment capacity reached 247 million m3/d [3]. A total of 16,000 municipal wastewater treatment facilities are in operation nationwide in the United States and process approximately 128.7 million m3 of wastewater per day [4]. In European countries, households and certain industries from 21,708 urban areas also produce 108.85 million m3 of wastewater per day.
With population intensification, urbanization and limited environmental capacity, municipal wastewater discharge standards are increasingly stringent [5]. Enhanced treatment systems enable wastewater plants to produce discharges that contain fewer nutrients and pollutants than those using conventional treatment methods [6]. However, since the wastewater industry is closely linked with energy consumption and greenhouse gas emissions, the implementation of carbon neutral and carbon peak strategies is driving its development towards green and low-carbon solutions [7,8]. Consequently, the development of high-efficiency and low-consumption wastewater treatment processes is at the forefront of environmental technology, regarding that the conventional wastewater treatment processes still suffer from low removal capacity and high energy consumption.
Considering the current challenges of low operational load, excess nitrogen and phosphorus in effluent, it is imperative to develop new technologies for precisely operating wastewater treatment processes to achieve deep pollutant reduction with low-consumption operation [7]. However, the complex compositions and origins of wastewater lead to difficulties in improving and regulating the performance of wastewater treatment. The complex interactions among the multiple physicochemical properties of wastewater and various process parameters jointly create difficulties in precisely controlling the wastewater treatment processes. Therefore, no unified certainty has been reported for the optimal conditions of treating wastewater with minimal chemical and energy consumption, but artificial intelligence technologies may create more opportunities. Furthermore, in conjunction with the promulgation of relevant policies, the removal of emerging pollutants, such as microplastics and antibiotics, has increased concerns in the current century [9,10]. Incorporating the removal effects of emerging pollutants into the conventional wastewater treatment effect evaluation system will be an inevitable trend in the future [10].
The machine learning model, represented by artificial neural networks (ANNs), is formed by the interconnection of a large number of nodes (or neurons) representing specific output functions [11,12]. The network connections of neural networks are regulated through weight values and excitation functions to optimize the network output and ultimately achieve a correlative fit between the input and output variables. Thus, the neural network model is nonlinear, adaptive and self-organizing [11,12] and has obvious advantages for the simulation of systems containing large-scale, complex structures and unclear information. The advantages of the machine learning method correspond to the complex and variable data structure of wastewater treatment. Thus, a neural network model is expected to have significant potential in optimizing various wastewater treatment processes. Some previous pieces of literature have reviewed the great potentials of artificial intelligence (AI) technologies in wastewater treatment processes [13,14], including intelligent water quality monitoring, artificial intelligence-assisted design of materials used for wastewater treatment and optimization of energy costs of wastewater treatment using artificial intelligence. However, AI is a very broad science, including robotics, language recognition, image recognition, natural language processing, expert systems, machine learning, computer vision and so on. There may still be a lack of studies specifically focusing on machine learning models for improving wastewater treatment efficiency, especially influent predication and energy saving, and the information on relevant research pioneers in terms of leading countries, institutions and scholars is still insufficient. Accordingly, these shortcomings make it necessary to systematically analyze the current status of machine learning approaches for optimizing wastewater treatment processes, which is crucial for orienting the development of AI-based wastewater engineering.
In this study, a bibliometric analysis was conducted to reveal the worldwide research progress and trend in the application of machine learning to wastewater treatment, especially the potential advantages of machine learning models for influent quality/quantity predication and energy conservation (Figure 1). Primarily, the related research was examined to visualize the cooperation among countries, institutions and authors involved in the field of artificial intelligence (AI)-assisted wastewater treatment process. Additionally, a co-occurrence analysis of keywords explores the research content in this field. Furthermore, the current study visualized cited authors and journals to identify leading researchers and prominent journals. All the findings here are believed to provide insights into future research directions in the wastewater processes based on AI technologies, which help avoid repetitive research and promote innovation in both scientific research and applications.

2. Materials and Methods

2.1. Data Sources

Among worldwide citation index, the Web of Science Core Collection (WoSCC) database is the most influential citation index with the most important journals in the field of natural sciences, which contains Science Citation Index−Expanded (SCI-E), Social Sciences Citation Index (SSCI) and other important citation index [15,16]. Based on previous bibliometric studies, the Web of Science Core Collection (WoSCC) database was widely used as the data source for this analysis. Thus, using the WoSCC database to obtain research articles and patents, this study seriously selected the following search strategy: TS = (“wastewater” OR “waste-water” OR “waste water” OR “polluted water”) AND TS = (“artificial intelligence” OR “machine learning” OR “deep learning”). The analysis covered a time span of 34 years, from 1991 (in which the first relevant paper was published) to 2024.
Up to 25 January 2024, 1226 related items were collected and finally retained after excluding incomplete and duplicate information.

2.2. Data Analysis Methods

This study applied CiteSpace (version 6.1.R3) as the main method, which could realize excellent analysis and visualization of information hidden in papers and patents. The method was also validated by previous studies on environment-related topics, e.g., sludge resource utilization and biochar application in electrochemical energy storage devices [17,18]. The CiteSpace was able to enhance understanding of the collected papers and patents from different aspects, including Country, Author, Cited Journal, Institution, Keywords and so on. It was worth noting that in the subsequent visual analysis charts, the size of the circle and the thickness of the outer circle indicated the number of articles or the number of citations related to the countries, authors, journals, keywords and so on, respectively. The thickness and numbers of the line between the two circles represented the strength of the cooperative relationship between different countries or other aspects. The index “Centrality”, which is a common and essential index in bibliometric analysis, was also included to characterize the cooperation relationships of other objects with a certain object. Here, the index “Centrality” refers to the number of connections between a node and other nodes in the network; a node with high centrality is directly connected to many other nodes, indicating its importance in the network.

3. Results

3.1. Publication Characteristics

Figure 2 illustrates the growth trends of worldwide publications related to the application of machine learning for wastewater treatment. As can be seen, since the first paper was published in 1991, the total number of publications climbed to 1226 at an average annual growth rate of 24.04% until 2024. The growth of publication status could be roughly divided into three stages: (1) the slow growth in 1991–2015, (2) the steady growth in 2016–2019 and (3) the rapid growth in 2020–2024. The results have shown that prior to 1990, there were limited studies on the application of machine learning to wastewater treatment, with the number of publications not exceeding 10 per year. Since 2015, the number of studies on machine learning applications in wastewater treatment has entered a period of rapid growth. In 2016–2019, the number of papers and patents surged, with an average annual growth rate of 55.36% and an average annual growth of 14.7 papers. Moreover, the average annual growth rate reached its maximum of 59.77% during 2020–2024. The rapid growth trend in the number of publications from 2016 onward can be attributed to a series of promotion policies that greatly encouraged and supported the development of artificial intelligence. For instance, in the United States, the National Strategic Plan for Artificial Intelligence Research and Development was issued in 2016 [19]. The plan aimed to strengthen the basic research in artificial intelligence and ensure a leading position.
The growth trend of publications in China was basically consistent with that in the world, with the average annual growth publications numbers of three stages being 0.75, 15.75 and 86.67, respectively (Figure 3). From May 2015 to May 2016, China enacted several development plans with respect to the development of artificial intelligence, such as “Made in China 2025” and “Three-year action plan of Internet Plus and artificial intelligence”. Overall, we may anticipate that this field of artificial intelligence and wastewater treatment is developing at a previously unheard-of rate and continue to yield fruitful outcomes in the future.

3.2. Country Contributions

Understanding countries’ publishing indicators and collaboration is one of the fundamental methods for comprehending the global distribution and development of research in the field of artificial intelligence and wastewater treatment. Figure 4 and Table 1 are a visual representation of the results of the country’s contributions and the top 10 countries in terms of publications, respectively. As shown in Figure 4, approximately 59 countries have paid sufficient attention and achieved much progress in this field, while the other countries (mainly in Africa and Central Asia) in the gray color block are still not active. According to Table 1, the top countries are China, the United States, India, Saudi Arabia, South Korea, Iran, Spain, England, Australia and Canada. Moreover, China and the United States are the top two, with publications of 342 and 209, respectively. Among these countries, six are developed countries, while four are developing countries, indicating that developed countries have more financial investments, stronger research capacities and more excellent researchers than developing countries. Thus, we can understand that countries in Africa are almost developing countries and have published no research related to artificial intelligence and wastewater treatment. Figure 5 visually displays the results of the countries’ collaboration network analysis. In Figure 5, one circle represents one country, with the color of the circle changing from gray in the center to red outside, representing the time of publication from 1991 to 2024. At the same time, the width of a color represents the number of papers and patents published in the corresponding year. The lines of two circles represent the cooperation of the two countries; the more lines there are, the closer the cooperation is. We can easily find that China and the United States are the two leading collaborators. China had more collaboration outputs, with the number 342, but its centrality was less, with 0.08; China has abundant cooperative research outputs, whereas cooperation needs to be more influential. On the contrary, the United States had fewer collaboration outputs, with the number 209, but its centrality was higher, with the number 0.22, and the related research started earlier, indicating that the United States is an outstanding global collaboration leader in this field. Moreover, the results of Figure 5 reveal that some countries’ collaboration outputs were less than that of China, but their centrality scores were more than that of China, and their research was earlier. This indicates that the above countries are highly influential in engaging in academic cooperation in this field. Therefore, researchers should consider numerous factors, such as research impact, research direction and so on, when seeking cooperation from other countries.

3.3. Institution Contributions

Since environmental pollution control is a global issue, it should be addressed through the strong cooperation of research institutions in various countries. Figure 6 provides the analysis result of Institution collaboration, and Table 2 lists the top 10 institutions in publications in the field of machine learning and wastewater treatment. Among the top 10 institutions, nine institutions are from a developing country, while only one (the University of Technology Sydney) is from a developed country. This reveals that institutions in developing countries pay more attention to the field of machine learning and wastewater treatment than in developed countries. The Chinese Academy of Sciences emerged as the leading institution in terms of the number of publications, and it had the second-highest centrality index (0.06). Additionally, the joint second and eighth institutions in terms of publication numbers were Harbin Institute of Technology, Guizhou Normal University and Tsinghua University, all located in China. The outstanding output of research was primarily due to the policy support of the Chinese government and also indicated that China has played a critical role in the research field of machine learning and wastewater treatment. Duy Tan University had the first highest centrality index (0.08), even though its publications only ranked seventh, revealing that research cooperation between Duy Tan University and other institutions around the world is more frequent. Moreover, two other Asian countries, Saudi Arabia and Iran, have also actively made important contributions to the research in this field of machine learning and wastewater treatment.

3.4. Author Contributions

This section selects the number of publications as the primary factor to systematically analyze author contributions. Figure 7 depicts the graph of the author collaboration network, where the larger corresponding circles in the collaboration network indicate a greater number of co-authored publications with other authors. Obviously, the size of circles related to Hu, Jiwei, Wei, Xionghui, Nasr, Mahmoud, etc., are larger, meaning their publication number is increasing and shows a significant influence on the following research. In the early period (from 1991 to about 2005), there were many single authors, but more cooperation among researchers. These authors were mainly from Spain, and it could be inferred that the research on machine learning and wastewater treatment was carried out earlier and received more attention in Spain. Over time, the number of researchers in this field increased. However, there was no significant increase in collaboration between researchers, indicating that the enthusiasm and attention of researchers in various countries for research cooperation are not high and need to be improved. The top 10 authors are shown in Table 3, along with their nationality, total number of publications and cooperation centrality. The leading two authors are Hu, Jiwei and Wei, Xionghui, who are from Guizhou Normal University, China, and Peking University, China, respectively. Fan Mingyi, Hu, Jiwei and their research team were committed to applying machine learning (such as artificial neural networks and back-propagation artificial neural networks) to model and optimize the process of removing pollutants using nanoporous materials [20]. Alongside Chinese researchers, researchers from Egypt, South Korea, Spain and Saudi Arabia have also made significant contributions in the field. Nasr, Mahmoud and his research team (from Alexandria University, Egypt) used machine learning and artificial intelligence techniques to predict Cu (II) adsorption from aqueous solutions onto Nano Zero-Valent Aluminum (nZVAl). According to their findings, the optimal modeling technique used to predict Cu (II) adsorption was ANN, which achieved high accuracy with MSE < 10−5. Overall, great developments in author contributions and collaborations are expected to realize breakthroughs in the application of machine learning to wastewater treatment.

3.5. Journal and Cited Journal Contributions

The systematic analysis of journal and cited journal contributions could provide researchers some insights on comprehending important sources in the literature related to machine learning and wastewater treatment, illuminating the research status of the field, targeting proper journals and conducting more efficient research. Figure 8, Table 4 and Table 5 depict the results of journals collaboration network analysis, top 10 journals and top 10 cited journals in publications, respectively. Some detailed findings are easy to identify: (1) Among 1091 items retained, 305 items were published in the top 10 journals, which account for approximately 30%, indicating that limited journals publish a relatively large number of papers in the field. (2) Based on the rules of Journal Citation Reports (JCR), eight journals are in the first quartile with an impact factor (2023) greater than 5, demonstrating that these top journals are strongly recognized among researchers. This finding in Table 4 is consistent with that of Figure 8. (3) Correspondingly, the top cited journals with Q1 quartile accounted for 9/10, with impact factors (2023) being higher than 8. This proves that papers published in these journals may be more authoritative and recognized. Other researchers prefer to take these journals as reliable and valuable sources for their research. (4) The majority of the top 10 journals are also listed in the top 10 cited journals, such as Water Research, Science of The Total Environment, Journal of Cleaner Production, Environmental Science & Technology, Water Science & Technology, Journal of Environmental Management and Chemosphere. Notably, Water Research is consistently ranked first in the list of the top 10 journals and top 10 cited journals, demonstrating its significant status and strong recognition, which also could be reflected in its high impact factor (2023) and category quartile. Similarly, previous studies found that Water Research was the leading cited journal in relevant research. However, Water Research may not be the leading productive journal since AI-related only accounts for a small portion of content published on Water Research [13]. Therefore, researchers should make great efforts to refine the innovation of their research and the quality of their manuscripts to successfully meet the publication requirements of these highly recognized journals, particularly Water Research, Science of The Total Environment and so on.

3.6. Keywords Characteristics

Analyzing the keywords and characteristics of selected publications is an effective method of identifying research trends and hotspots. Figure 9 presents the top 25 keywords with the strongest citation bursts in the field of machine learning and wastewater treatment. In Figure 9, the value “Year” represents the burst time of each keyword, while “Begin” and “End” mean the duration of each keyword’s burst. Meanwhile, the value of “Strength” represents the intensity of each keyword’s emergence. Based on the publication characteristics of the three stages, some valuable insights can be observed from the keyword analysis. In the first stage, spanning from 1991 to 2015, keywords such as “expert system”, “activated sludge process” and “system” were significant terms, whose “Strength” values were 3.7, 5.77 and 5.21, respectively. It can be observed that the research of the first stage focused more on the mechanism optimization of wastewater treatment systems based on the activated sludge process, even though the keyword “artificial neural network” existed in 1993. Thus, the lack of connection between machine learning and wastewater treatment resulted in the slow progress of the related research publication during this period. In the second stage, keywords such as “artificial neural network”, “activated carbon”, “predictive control” and “aqueous solution” started to burst. This burst reveals that compared with a more in-depth study of wastewater treatment mechanisms, researchers are more and more inclined to adopt machine learning methods to optimize the improvement of wastewater treatment effect and process control, which leads to the relevant research achievements gradually increasing. In the third and current stage, spanning from 2020 to now, keywords such as “time”, “reuse” and “impact” have become hot topics in this field, indicating that future research will focus on resource reuse, carbon reduction and other directions, in line with the current concept of carbon peaking and carbon neutrality.

4. Discussion

Since the mechanism of the wastewater treatment process is complex, and its treatment effect is greatly affected by the continuously fluctuating influent water quality and other factors, it is difficult to ensure that the treated effluent meets the discharge standards [21]. Meanwhile, the operation of WWTPs still mainly depends on the experience of operating workers with a lack of intelligent operation and the concept of energy saving and consumption reduction. Over the last few decades, in order to optimize the wastewater treatment process, researchers have devoted themselves to developing excellent mechanistic modeling for describing physico-chemical and biological processes taking place in wastewater treatment systems [22,23]. However, several mechanistic models with great simulation in design cannot be well adapted to the actual complex operating conditions of the wastewater treatment process, for example, the Activated Sludge Model series and Anaerobic Digestion Model 1 [24,25,26]. With high efficiency in dealing with complex and multimodal problems, machine learning models are widely applied in various areas, including wastewater treatment [27,28,29,30]. In contrast to mechanistic models that try to find underlying reaction mechanisms of the wastewater treatment process, machine learning models focus on searching for empirical relationships among a large set of data to handle complex nonlinear problems [31,32]. Up to now, abundant machine learning models such as ANNs [33,34], Long Short-Term Memory (LSTM) [35,36], Recurrent Neural Networks (RNNs) [37,38] and Convolutional Neural Networks (CNNs) [39] are primarily used for predict the removal of pollutants and optimally manager the wastewater treatment process. The brief applications analysis of machine learning models employed in wastewater treatment areas are as follows.

4.1. Future Research Prospects and Challenges—Pollutants Prediction

Owing to the existence of inorganic (e.g., heavy metals) and organic pollutants (e.g., persistent organic pollutants), wastewater has inevitably caused harm to the ecological environment and human health. Effective prediction of wastewater pollutants could provide valuable decision-making support for ensuring WWTP effluent water quality, consequently protecting the ecological environment and human health by mitigating the impact of wastewater on the ambient environment. To date, various machine learning models, including Random Forest (RF), Support Vector Machine (SVM), ANN, CNN, LSTM, etc., have been successfully developed to focus on predicting pollutants, such as COD, BOD, TN, TP and so on [40]. In the study conducted by Zhiwei Guo et al., 2020 [23], the effluent COD and NH4+-N were modeled using CNN, LSTM and a novel prediction model (PC-CR) with the mixture of CNN and RNN. Their study demonstrated that the prediction accuracy of the proposed PC-CR model was about 10% better than a single CNN and 8% better than a single LSTM, indicating the superiority of the combination of CNN and RNN. Similarly, the value of heavy metals in industrial effluents, particularly in municipal wastewater, is also predicted through model control [41,42]. It was previously reported that ANN, ANN-GA and ANN-particle swarm optimization models could predict Cu2+, Cd2+, As3+, Mn2+ and Cr6+ with an R2 value of 0.95–1.0 [43]. Many studies have found that hybrid machine learning models combining individual models have higher accuracy and better prediction performance than single models. For instance, TCN and LSTM could be combined to form a new hybrid TCN-LSTM model real-time effluent TN prediction [44], which could achieve 33.1% higher accuracy as compared to the single TCN or LSTM model. Of these related studies, it is easy to find that the existing models have paid more attention to predicting the pollutant value of effluent than influent, but the large fluctuation of influent pollutants causes a serious threat to WWTPs effluent water quality.

4.2. Future Research Prospects and Challenges—Process Control

Besides excellent effluent quality, low energy consumption and low-carbon operation, mainly from electricity consumption and chemical consumption, are important factors in the economical and efficient operation of wastewater treatment plants. For example, a hybrid model combining statistical learning and deep reinforcement learning was developed to control pump operation, achieving a 16.7% decrease in electrical energy consumption while still achieving a 97% reduction in the number of alarms [45]. In another study, the application of machine learning techniques was highlighted based on multimodal strategies to intelligent aeration control in WWTPs [46] and could reduce operation costs by 19.8% compared with conventional fuzzy control methods. Similar research about machine learning techniques on intelligent control of aeration, dosing and equipment operation are conducted by several researchers [47,48]. Overall, machine learning models on wastewater treatment process control have gained rapid development due to reducing energy consumption and carbon emissions and pursuing carbon neutrality.
Based on the above analysis, the following research directions and challenges will obtain more attention from future scholars: (1) Due to the particularity of the wastewater industry and the protection of privacy policy, the operation data of WWTPs are often not made public, making it difficult to obtain these data [49]. Future machine learning models need to address the challenges caused by the limitation of the training data. Moreover, the accuracy and completeness of the operation data are greatly affected by factors such as seasons, weather and the sensitivity of monitoring instruments [49]. Systematic implementation of rigorous data preprocessing protocols, encompassing noise reduction, feature normalization and missing value imputation, demonstrably enhances the generalization capability of predictive models by mitigating distributional shifts across disparate wastewater treatment datasets [50,51]. The substantial heterogeneity in wastewater characteristics arising from geographical disparities and diverse treatment processes poses significant challenges to model generalization, necessitating the development of adaptive transfer learning frameworks capable of domain-specific knowledge transfer while preserving operational robustness across heterogeneous scenarios. (2) The black-box nature of machine learning models in wastewater treatment persists as a critical limitation, with opaque decision-making logic and insufficient interpretability of domain-specific feature interactions undermining trust in predictive outcomes for operational decision-making. Future machine learning models require effective explainability methods such as Shapley value analysis to systematically decode decision logic and domain-specific operational principles. (3) Similar to effluent prediction, models of influent prediction will become a hot topic because effective prediction contributes to reducing the damage of influent sharp fluctuation to wastewater treatment effluent quality. (4) A series of policies and regulations about removing emerging contaminants such as antibiotics and penicillin will vigorously promote the development of related models. (5) Better hybrid and comprehensiveness are the trends of future machine learning models’ characteristics, which is in accordance with the previous research [14]. A hybrid model could take advantage of two or more machine learning models or mechanistic models, which have been verified as excellent models for tackling limited data. (6) As discussed in previous studies, significant advancements in AI-based technologies have been achieved in intelligent water quality monitoring, innovative material development and energy cost optimization. In particular, machine learning algorithms, such as Convolutional Neural Networks and Long Short-Term Memory, demonstrated remarkable capabilities in predicting process parameters [13,14]. However, most of the previous studies overlooked the significantly increasing demand for real-time data acquirement matching to highly efficient AI-based models. That would cause more frequent consumption of sensor devices working under strict environmental conditions of wastewater treatment plants. Thus, the development of advanced data acquirement devices and the AI model prediction with partially default data should also be another focus of future research. Moreover, future models should be systematic, strong and multi-objective models, simultaneously achieving pollutant prediction, process control, safety pre-warning and so on. It is worth mentioning that the current future research proposed here is only based on the authors’ understanding. Expect the direct application in wastewater treatment processes; the machine learning approaches can also be used to facilitate the prediction about how the method itself can be better applied in the field of water treatment in the future.
The authors should discuss the results and how they can be interpreted from the perspective of previous studies and the working hypotheses. The findings and their implications should be discussed in the broadest context possible. Future research directions may also be highlighted.

5. Conclusions

With the aid of bibliometric analysis, this paper systematically and comprehensively analyzed the current development status and future trends of machine learning in the field of wastewater treatment. Through a visual review of 1226 publications obtained from the WoSCC database from different perspectives, including publication, country, institution, author, journal keyword, research prospects and challenges, the primary finding is that the growth of worldwide publication could be roughly divided into three stages. Since 2020, the number of publications on related themes has visibly risen. Moreover, China and the United States are the two leading countries, with publications of 342 and 209, respectively. The top contributing authors and institutions in the world are mainly located in developing countries, with the first being from China. Furthermore, the top 10 journals in publications almost all belong to the Q1 quartile (accounting for 9/10). Remarkedly, Water Research has received the most citations and played a critical role, indicating that this journal has a significant and worldwide academic reputation in the related field. Based on bibliometric analysis, some research prospects and challenges provided by this paper are expected to inspire future research insights into conducting related studies about machine learning and wastewater treatment: (1) Models for influent prediction will become a hot topic because effective prediction contributes to reducing the loading shock of influent sharp fluctuation to wastewater treatment effluent quality. (2) Better hybrid and comprehensiveness are the trends of future machine learning models’ characteristics, which could exhibit the significant advantage of integrating machine learning models with mechanistic models. (3) In particular, most of the previous studies overlooked the significantly increasing demand for real-time data acquirement matching to AI-based models. Thus, the development of advanced data acquirement devices working under strict environmental conditions of wastewater treatment plants or the AI model prediction with partially default data is the future need of AI technology application in wastewater treatment.

Author Contributions

Conceptualization, K.Z. and B.W.; methodology, K.Z.; writing—original draft preparation, K.Z.; writing—review and editing, B.W.; supervision, X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Shanghai Municipal Engineering Design Institute (Group) Co., Ltd. [NO. K2022N004A] and the State Key Laboratory of Pollution Control and Resource Reuse [NO. PCRRF21015].

Data Availability Statement

The original data presented in the study are openly available on Web of Science at https://webofscience.clarivate.cn/wos/alldb/basic-search (accessed on 18 March 2025).

Conflicts of Interest

Authors Kun Zhou and Xin Zhang were employed by the company Shanghai Municipal Engineering Design Institute (Group) Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Xu, Z.; Xu, J.; Yin, H.; Jin, W.; Li, H.; He, Z. Urban river pollution control in developing countries. Nat. Sustain. 2019, 2, 158–159. [Google Scholar] [CrossRef]
  2. Dunn, J.B.; Greene, K.; Vasquez-Arroyo, E.; Awais, M.; Gomez-Sanabria, A.; Kyle, P.; Palatnik, R.R.; Schaeffer, R.; Zhou, P.; Aissaoui, B. Toward Enhancing Wastewater Treatment with Resource Recovery in Integrated Assessment and Computable General Equilibrium Models. Environ. Sci. Technol. Lett. 2024, 11, 654–663. [Google Scholar] [CrossRef]
  3. Qu, J.; Ren, H.; Wang, H.; Wang, K.; Yu, G.; Ke, B.; Yu, H.-Q.; Zheng, X.; Li, J. Concept wastewater treatment plants in China. In Pathways to Water Sector Decarbonization, Carbon Capture and Utilization; IWA Publishing: London, UK, 2022; p. 265. [Google Scholar]
  4. Krause, M.J.; Bronstein, K.E. Estimating national sludge generation and disposal from US drinking water and wastewater treatment plants. J. Clean. Prod. 2024, 453, 142121. [Google Scholar] [CrossRef] [PubMed]
  5. Wang, C.; Chai, X.; Lu, B.; Lu, W.; Han, H.; Mu, Y.; Gu, Q.; Wu, B. Integrated control strategy for dual sludge ages in the high-concentration powder carrier bio-fluidized bed (HPB) technology: Enhancing municipal wastewater treatment efficiency. J. Environ. Manag. 2024, 351, 119890. [Google Scholar] [CrossRef] [PubMed]
  6. Murshid, S.; Antonysamy, A.; Dhakshinamoorthy, G.; Jayaseelan, A.; Pugazhendhi, A. A review on biofilm-based reactors for wastewater treatment: Recent advancements in biofilm carriers, kinetics, reactors, economics, and future perspectives. Sci. Total Environ. 2023, 892, 164796. [Google Scholar] [CrossRef]
  7. Zhang, C.; Zhao, G.; Jiao, Y.; Quan, B.; Lu, W.; Su, P.; Tang, Y.; Wang, J.; Wu, M.; Xiao, N. Critical analysis on the transformation and upgrading strategy of Chinese municipal wastewater treatment plants: Towards sustainable water remediation and zero carbon emissions. Sci. Total Environ. 2023, 896, 165201. [Google Scholar] [CrossRef]
  8. Daigger, G.T.; Kuo, J.; Derlon, N.; Houweling, D.; Jimenez, J.A.; Johnson, B.R.; McQuarrie, J.P.; Murthy, S.; Regmi, P.; Roche, C.; et al. Biological and physical selectors for mobile biofilms, aerobic granules, and densified-biological flocs in continuously flowing wastewater treatment processes: A state-of-the-art review. Water Res. 2023, 242, 120245. [Google Scholar] [CrossRef]
  9. Kayan, G.Ö.; Kayan, A. Polycaprolactone composites/blends and their applications especially in water treatment. ChemEngineering 2023, 7, 104. [Google Scholar] [CrossRef]
  10. Yaman, H.; Baig, M.T.; Kayan, A. Synthesis and Characterization of Tetrasubstituted Porphyrin Tin (IV) Complexes and Their Adsorption Properties over Tetracycline Antibiotics. Reactions 2025, 6, 12. [Google Scholar] [CrossRef]
  11. Abidoye, L.K.; Mahdi, F.M.; Idris, M.O.; Alabi, O.O.; Wahab, A.A. ANN-derived equation and ITS application in the prediction of dielectric properties of pure and impure CO2. J. Clean. Prod. 2018, 175, 123–132. [Google Scholar] [CrossRef]
  12. Jun, M.A.; Ding, Y.; Cheng, J.C.P.; Jiang, F.; Wan, Z. A Temporal-Spatial Interpolation and Extrapolation Method Based on Geographic Long Short-Term Memory Neural Network for PM2.5. J. Clean. Prod. 2019, 237, 117729. [Google Scholar]
  13. Baarimah, A.O.; Bazel, M.A.; Alaloul, W.S.; Alazaiza, M.Y.; Al-Zghoul, T.M.; Almuhaya, B.; Khan, A.; Mushtaha, A.W. Artificial intelligence in wastewater treatment: Research trends and future perspectives through bibliometric analysis. Case Stud. Chem. Environ. Eng. 2024, 10, 100926. [Google Scholar] [CrossRef]
  14. Li, X.; Su, J.; Wang, H.; Boczkaj, G.; Mahlknecht, J.; Singh, S.V.; Wang, C. Bibliometric analysis of artificial intelligence in wastewater treatment: Current status, research progress, and future prospects. J. Environ. Chem. Eng. 2024, 12, 113152. [Google Scholar] [CrossRef]
  15. Jin, L.; Sun, X.; Ren, H.; Huang, H. Biological filtration for wastewater treatment in the 21st century: A data-driven analysis of hotspots, challenges and prospects. Sci. Total Environ. 2023, 855, 158951. [Google Scholar] [CrossRef]
  16. Zhang, S.; Jin, Y.; Chen, W.; Wang, J.; Wang, Y.; Ren, H. Artificial intelligence in wastewater treatment: A data-driven analysis of status and trends. Chemosphere 2023, 336, 139163. [Google Scholar] [CrossRef]
  17. Li, L.; Hua, Y.; Zhao, S.; Yang, D.; Chen, S.; Song, Q.; Gao, J.; Dai, X. Worldwide research progress and trend in sludge treatment and disposal: A bibliometric analysis. ACS EST Eng. 2023, 3, 1083–1097. [Google Scholar] [CrossRef]
  18. Ma, J.; Zheng, L.; Yu, F. Current status and future prospects of biochar application in electrochemical energy storage devices: A bibliometric review. Desalination 2024, 581, 117597. [Google Scholar] [CrossRef]
  19. Plan, S. The National Artificial Intelligence Research and Development Strategic Plan; National Science and Technology Council, Networking and Information Technology Research and Development Subcommittee: Washington, DC, USA, 2016. [Google Scholar]
  20. Fan, M.; Hu, J.; Cao, R.; Ruan, W.; Wei, X. A review on experimental design for pollutants removal in water treatment with the aid of artificial intelligence. Chemosphere 2018, 200, 330. [Google Scholar] [CrossRef]
  21. Alvi, M.; Batstone, D.; Mbamba, C.K.; Keymer, P.; French, T.; Ward, A.; Dwyer, J.; Cardell-Oliver, R. Deep learning in wastewater treatment: A critical review. Water Res. 2023, 245, 120518. [Google Scholar] [CrossRef]
  22. Croll, H.C.; Ikuma, K.; Ong, S.K.; Sarkar, S. Reinforcement learning applied to wastewater treatment process control optimization: Approaches, challenges, and path forward. Crit. Rev. Environ. Sci. Technol. 2023, 53, 1775–1794. [Google Scholar] [CrossRef]
  23. Guo, Z.; Du, B.; Wang, J.; Shen, Y.; Li, Q.; Feng, D.; Gao, X.; Wang, H. Data-driven prediction and control of wastewater treatment process through the combination of convolutional neural network and recurrent neural network. RSC Adv. 2020, 10, 13410–13419. [Google Scholar] [CrossRef] [PubMed]
  24. Henze, M.; Gujer, W.; Mino, T.; Van Loosedrecht, M. Activated Sludge Models ASM1, ASM2, ASM2d and ASM3; IWA Publishing: London, UK, 2006. [Google Scholar]
  25. Iacopozzi, I.; Innocenti, V.; Marsili-Libelli, S.; Giusti, E. A modified Activated Sludge Model No. 3 (ASM3) with two-step nitrification–denitrification. Environ. Model. Softw. 2007, 22, 847–861. [Google Scholar] [CrossRef]
  26. Fenu, A.; Guglielmi, G.; Jimenez, J.; Spèrandio, M.; Saroj, D.; Lesjean, B.; Brepols, C.; Thoeye, C.; Nopens, I. Activated sludge model (ASM) based modelling of membrane bioreactor (MBR) processes: A critical review with special regard to MBR specificities. Water Res. 2010, 44, 4272–4294. [Google Scholar] [CrossRef] [PubMed]
  27. Hwangbo, S.; Al, R.; Chen, X.; Sin, G.r. Integrated model for understanding N2O emissions from wastewater treatment plants: A deep learning approach. Environ. Sci. Technol. 2021, 55, 2143–2151. [Google Scholar] [CrossRef]
  28. Thompson, K.A.; Dickenson, E.R. Using machine learning classification to detect simulated increases of de facto reuse and urban stormwater surges in surface water. Water Res. 2021, 204, 117556. [Google Scholar] [CrossRef]
  29. Mahadevkar, S.V.; Khemani, B.; Patil, S.; Kotecha, K.; Vora, D.R.; Abraham, A.; Gabralla, L.A. A review on machine learning styles in computer vision—Techniques and future directions. IEEE Access 2022, 10, 107293–107329. [Google Scholar] [CrossRef]
  30. Ureel, Y.; Dobbelaere, M.R.; Ouyang, Y.; De Ras, K.; Sabbe, M.K.; Marin, G.B.; Van Geem, K.M. Active machine learning for chemical engineers: A bright future lies ahead! Engineering 2023, 27, 23–30. [Google Scholar] [CrossRef]
  31. Mamandipoor, B.; Majd, M.; Sheikhalishahi, S.; Modena, C.; Osmani, V. Monitoring and detecting faults in wastewater treatment plants using deep learning. Environ. Monit. Assess. 2020, 192, 148. [Google Scholar] [CrossRef]
  32. Krishnaraj, A.; Deka, P.C. Spatial and temporal variations in river water quality of the Middle Ganga Basin using unsupervised machine learning techniques. Environ. Monit. Assess. 2020, 192, 744. [Google Scholar] [CrossRef]
  33. Vakili, M.; Mojiri, A.; Kindaichi, T.; Cagnetta, G.; Yuan, J.; Wang, B.; Giwa, A.S. Cross-linked chitosan/zeolite as a fixed-bed column for organic micropollutants removal from aqueous solution, optimization with RSM and artificial neural network. J. Environ. Manag. 2019, 250, 109434. [Google Scholar] [CrossRef]
  34. Kang, J.-H.; Song, J.; Yoo, S.S.; Lee, B.-J.; Ji, H.W. Prediction of odor concentration emitted from wastewater treatment plant using an artificial neural network (ANN). Atmosphere 2020, 11, 784. [Google Scholar] [CrossRef]
  35. Yaqub, M.; Asif, H.; Kim, S.; Lee, W. Modeling of a full-scale sewage treatment plant to predict the nutrient removal efficiency using a long short-term memory (LSTM) neural network. J. Water Process Eng. 2020, 37, 101388. [Google Scholar] [CrossRef]
  36. Pisa, I.; Santin, I.; Morell, A.; Vicario, J.L.; Vilanova, R. LSTM-based wastewater treatment plants operation strategies for effluent quality improvement. IEEE Access 2019, 7, 159773–159786. [Google Scholar] [CrossRef]
  37. Cheng, T.; Harrou, F.; Kadri, F.; Sun, Y.; Leiknes, T. Forecasting of wastewater treatment plant key features using deep learning-based models: A case study. IEEE Access 2020, 8, 184475–184485. [Google Scholar] [CrossRef]
  38. Qiao, J.; Huang, X.; Han, H. Recurrent neural network-based control for wastewater treatment process. In Proceedings of the Advances in Neural Networks–ISNN 2012: 9th International Symposium on Neural Networks, Shenyang, China, 11–14 July 2012. Proceedings, Part II 9. [Google Scholar]
  39. Wang, Z.; Man, Y.; Hu, Y.; Li, J.; Hong, M.; Cui, P. A deep learning based dynamic COD prediction model for urban sewage. Environ. Sci. Water Res. Technol. 2019, 5, 2210–2218. [Google Scholar] [CrossRef]
  40. Zaghloul, M.S.; Achari, G. Application of machine learning techniques to model a full-scale wastewater treatment plant with biological nutrient removal. J. Environ. Chem. Eng. 2022, 10, 107430. [Google Scholar] [CrossRef]
  41. Ke, B.; Nguyen, H.; Bui, X.-N.; Bui, H.-B.; Choi, Y.; Zhou, J.; Moayedi, H.; Costache, R.; Nguyen-Trang, T. Predicting the sorption efficiency of heavy metal based on the biochar characteristics, metal sources, and environmental conditions using various novel hybrid machine learning models. Chemosphere 2021, 276, 130204. [Google Scholar] [CrossRef]
  42. Bhagat, S.K.; Tung, T.M.; Yaseen, Z.M. Development of artificial intelligence for modeling wastewater heavy metal removal: State of the art, application assessment and possible future research. J. Clean. Prod. 2020, 250, 119473. [Google Scholar] [CrossRef]
  43. Zhao, L.; Dai, T.; Qiao, Z.; Sun, P.; Hao, J.; Yang, Y. Application of artificial intelligence to wastewater treatment: A bibliometric analysis and systematic review of technology, economy, management, and wastewater reuse. Process Saf. Environ. Prot. 2020, 133, 169–182. [Google Scholar] [CrossRef]
  44. Xie, Y.; Chen, Y.; Wei, Q.; Yin, H. A hybrid deep learning approach to improve real-time effluent quality prediction in wastewater treatment plant. Water Res. 2024, 250, 121092. [Google Scholar] [CrossRef]
  45. Filipe, J.; Bessa, R.J.; Reis, M.; Alves, R.; Póvoa, P. Data-driven predictive energy optimization in a wastewater pumping station. Appl. Energy 2019, 252, 113423. [Google Scholar] [CrossRef]
  46. Wang, H.-C.; Wang, Y.-Q.; Wang, X.; Yin, W.-X.; Yu, T.-C.; Xue, C.-H.; Wang, A.-J. Multimodal machine learning guides low carbon aeration strategies in urban wastewater treatment. Engineering 2024, 36, 51–62. [Google Scholar] [CrossRef]
  47. Wang, Y.-Q.; Wang, H.-C.; Song, Y.-P.; Zhou, S.-Q.; Li, Q.-N.; Liang, B.; Liu, W.-Z.; Zhao, Y.-W.; Wang, A.-J. Machine learning framework for intelligent aeration control in wastewater treatment plants: Automatic feature engineering based on variation sliding layer. Water Res. 2023, 246, 120676. [Google Scholar] [CrossRef]
  48. Icke, O.; Van Es, D.; de Koning, M.; Wuister, J.; Ng, J.; Phua, K.; Koh, Y.; Chan, W.; Tao, G. Performance improvement of wastewater treatment processes by application of machine learning. Water Sci. Technol. 2020, 82, 2671–2680. [Google Scholar] [CrossRef] [PubMed]
  49. Kazadi Mbamba, C.; Batstone, D. Optimization of deep learning models with genetic algorithms for forecasting performance in water industry. Comput. Chem. Eng. 2023, 175, 108276. [Google Scholar] [CrossRef]
  50. Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar]
  51. Li, K.; Duan, H.; Liu, L.; Qiu, R.; van den Akker, B.; Ni, B.-J.; Chen, T.; Yin, H.; Yuan, Z.; Ye, L. An integrated first principal and deep learning approach for modeling nitrous oxide emissions from wastewater treatment plants. Environ. Sci. Technol. 2022, 56, 2816–2826. [Google Scholar] [CrossRef]
Figure 1. General structure of this study.
Figure 1. General structure of this study.
Water 17 01314 g001
Figure 2. Growth trends of worldwide publications.
Figure 2. Growth trends of worldwide publications.
Water 17 01314 g002
Figure 3. Growth trends of publications of China.
Figure 3. Growth trends of publications of China.
Water 17 01314 g003
Figure 4. Visual representation result of country publication contributions.
Figure 4. Visual representation result of country publication contributions.
Water 17 01314 g004
Figure 5. Countries collaboration network analysis.
Figure 5. Countries collaboration network analysis.
Water 17 01314 g005
Figure 6. Institutions collaboration network analysis.
Figure 6. Institutions collaboration network analysis.
Water 17 01314 g006
Figure 7. Authors collaboration network analysis.
Figure 7. Authors collaboration network analysis.
Water 17 01314 g007
Figure 8. Journals collaboration network analysis.
Figure 8. Journals collaboration network analysis.
Water 17 01314 g008
Figure 9. Top 25 keywords with the strongest citation bursts.
Figure 9. Top 25 keywords with the strongest citation bursts.
Water 17 01314 g009
Table 1. Top 10 countries in publications about machine learning and wastewater treatment.
Table 1. Top 10 countries in publications about machine learning and wastewater treatment.
RankingCountriesCountCentrality
1China342 (highest)0.08
2USA2090.22 (highest)
3India1280.13
4Saudi Arabia920.14
5South Korea830.09
6Iran790.13
8Spain660.14
8England660.13
9Australia600.03
10Canada570.08
Table 2. Top 10 institutions in publications about machine learning and wastewater treatment.
Table 2. Top 10 institutions in publications about machine learning and wastewater treatment.
RankingInstitutionsCountriesCountCentrality
1Chinese Acad SciChina31 (highest)0.06
2Harbin Inst TechnolChina190.01
2Univ Technol SydneyAustralia190.06
4King Khalid UnivSaudi Arabia170.04
4Univ TehranIran170.02
6Islamic Azad UnivIran160.06
7Duy Tan UnivVietnam150.08 (highest)
8Guizhou Normal UnivChina140.01
8Tsinghua UnivChina140.06
10King Fahd Univ Petr & MineralsSaudi Arabia130.02
Table 3. Top 10 authors in publications about machine learning and wastewater treatment.
Table 3. Top 10 authors in publications about machine learning and wastewater treatment.
RankingAuthorsCountriesCountCentrality
1Hu, JiweiChina12 (highest)0.00
2Wei, XionghuiChina90.00
3Nasr, MahmoudEgypt80.00
3Cho, Kyung HwaSouth Korea80.00
5Mahmoud, Ahmed SEgypt70.00
8Poch, MSpain60.00
8Comas, JSpain60.00
8Cortés, USpain60.00
8Rezk, HegazySaudi Arabia50.00
10Huang, XianfeiChina50.00
Table 4. Top 10 journals in publications about machine learning and wastewater treatment.
Table 4. Top 10 journals in publications about machine learning and wastewater treatment.
RankingJournalsCountJournal Impact Factor (2023)Category Quartile
1Water Research41 (highest)11.4Q1
2Science of The Total Environment408.2Q1
3Water353.0Q2
3Journal of Cleaner Production319.7Q1
5Environmental Science & Technology2910.8Q1
8Journal of Water Process Engineering296.3Q1
8Water Science & Technology282.5Q3
8Journal of Environmental Management278.0Q1
8Environmental Science and Pollution Research23no data no data
10Chemosphere228.1Q1
Table 5. Top 10 cited journals in publications about machine learning and wastewater treatment.
Table 5. Top 10 cited journals in publications about machine learning and wastewater treatment.
RankingJournalsCountCentralityJournal Impact Factor (2023)Category Quartile
1Water Research574 (highest)0.0711.4Q1
2Science of The Total Environment4610.008.2Q1
3Chemical Engineering Journal3860.1313.3Q1
4Water Science & Technology3750.032.5Q3
5Journal of Environmental Management3640.032.7Q3
6Chemosphere3550.108.1Q1
7Environmental Science & Technology3420.1310.8Q1
8Bioresource Technology3340.089.7Q1
9Journal of Hazardous Materials3290.0112.2Q1
10Journal of Cleaner Production3260.019.7Q1
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, K.; Wu, B.; Zhang, X. Worldwide Research Progress and Trends in Application of Machine Learning to Wastewater Treatment: A Bibliometric Analysis. Water 2025, 17, 1314. https://doi.org/10.3390/w17091314

AMA Style

Zhou K, Wu B, Zhang X. Worldwide Research Progress and Trends in Application of Machine Learning to Wastewater Treatment: A Bibliometric Analysis. Water. 2025; 17(9):1314. https://doi.org/10.3390/w17091314

Chicago/Turabian Style

Zhou, Kun, Boran Wu, and Xin Zhang. 2025. "Worldwide Research Progress and Trends in Application of Machine Learning to Wastewater Treatment: A Bibliometric Analysis" Water 17, no. 9: 1314. https://doi.org/10.3390/w17091314

APA Style

Zhou, K., Wu, B., & Zhang, X. (2025). Worldwide Research Progress and Trends in Application of Machine Learning to Wastewater Treatment: A Bibliometric Analysis. Water, 17(9), 1314. https://doi.org/10.3390/w17091314

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop