Nearest-Better Network-Assisted Fitness Landscape Analysis of Contaminant Source Identification in Water Distribution Network
Abstract
:1. Introduction
2. Related Work
2.1. Fitness Landscape
- A set of potential solutions to the problem;
- A notion of neighborhood, nearness, distance, or accessibility on ; and
- A fitness function . The fitness of a solution indicates how good the solution is, and, the larger the fitness value, the better the solution.
- Search space: The search space is the union of all possible solutions of an optimization problem;
- Neighborhood: The neighborhood relationship is a mapping : , which associates each solution with a set of candidate solutions , called neighbors, which can be reached by applying a local search operator for a one-step search or defined by a distance metric;
- Basin of Attraction (BoA) and local optimum: , where the BoA of a local optimum is the set of solutions that approaches by utilizing a local search strategy within the decision variable space [11].
- Modality [14]: Modality refers to the number of local optima in a fitness landscape. A problem is unimodal if there is only one global optimum, whereas a problem is multimodal if there are multiple local optima;
- Ruggedness: A rugged landscape can be regarded as a multi-modality landscape with many sharp descents and ascends;
- Neutrality [15]: In a landscape where the fitness values of the neighborhood solutions of a solution yield no change, the area around the solution is neutral;
- Separability [16]: It refers to the degree of dependency between components of the variable;
- Dynamic change [17]: Dynamic means that the fitness or constraint functions may change over time.
2.2. Features Derived from CSWIDN
2.2.1. Modality
2.2.2. Dynamic Change
2.2.3. Separability
2.2.4. Other Features
3. Fitness Landscape Analysis Method
3.1. Nearest-Better Network
3.1.1. Optimization Problem for CSWIDN
3.1.2. Definition of the Distance
3.1.3. Metrics for Fitness Lansacape Analysis
- RuggednessA rugged landscape can be regarded as a multi-modality landscape with many sharp descents and ascends, while is the distance between a solution and its nearest-better solution. If the standard deviation of is relatively large, this indicates that the fitness landscape has many sharp descents and ascents.
- NeutralityNeutral space refers to areas where fitness values show little variation. Thus, the area of this neutral space can be calculated as a metric for neutrality as follows:
- Differences between two fitness landscapesHowever, this paper also needs to analyze the dynamic change and separability of the problem. The key to analyzing these two properties is to examine the differences between two fitness landscapes. This paper proposes a new fitness landscape metric to analyze the differences between two fitness landscapes.As illustrated in Figure 6, it is not sufficient to rely on counting the number of solutions with different fitness values as an indicator of the difference between two fitness landscapes. In the figure, only a few blue solutions (on the edge of the fitness landscape) have the same fitness value. It seems that only the edge of the attraction domain remains unchanged. But, in fact, these two fitness landscapes are very similar, with only three changed areas: (1) The optimal solution of the peak on the left has shifted slightly; (2) The BoA on the right has become larger, and, correspondingly, the BoA on the left has become smaller; (3) The two BoAs on the right have merged into one. And, precisely in these three changed areas, the nearest-better relationships are different. Therefore, the difference between two fitness landscapes can be evaluated by counting different nearest-better relationships for each solution. The formula is as follows:
- Modality in the biased data setIn previous research [8], optimal solutions were identified solely based on the magnitude of the nearest-better distance (NBD) of the solutions. However, this approach is unsuitable for the NBN generated from biased data. The distribution of solutions generated by the algorithm is non-uniform. In the early stages of evolution, the algorithm’s search radius is relatively large, resulting in higher NBD values in some poorer regions. Consequently, some solutions may be mistakenly identified as local optima due to their larger NBD. In reality, optimal solutions typically refer to those with better fitness values. In the biased data-based NBN, fitness and NBD are integrated to identify optima, as shown in the following equation:
4. Fitness Landscape Analysis
4.1. Sampling Method
- Sampling in continuous search space
- −
- Global sampling in the continuous space centered atTo observe the landscape features of the problem in the continuous space, the pollution source location components of the sampled solutions are set the same as those of the central solution while randomly sampling the pollution injection information component ;
- −
- Local sampling in continuous space centered atTo further investigate the landscape features of the problem in the continuous space, the samplings are centered around , keeping the pollution injection information the same as while performing local sampling on the pollution injection information component within a sampling radius r;
- Sampling in combinatorial search space centered atIn the global sampling of the combinatorial space centered at , , the pollution injection information component of the sampled solutions is set to match that of the central solution , while the pollution source locations component is varied for global sampling. Given that the number of pollution sources is relatively small (), there are only solutions in the combinatorial space, where is the number of nodes in the water distribution network. Therefore, it is feasible to conduct a complete sampling of the entire combinatorial solution space;
- Sampling in the whole search spaceTo analyze the original fitness landscape features of the CSWIDN problem, this paper also performs global random sampling in the original solution space to generate a set of sampled solutions ;
- Sampling by algorithmThe algorithm focuses more on solutions with better fitness, making its sampling biased. The set of sampled solution generated by the algorithm allows for an indirect analysis of the algorithm’s behavior. This paper applies the Adaptive Multi-Population Algorithm [3] as the sampling method, which is designed comprehensively to account for various landscape features of the problem, such as dynamic changes, multimodality, and separability. Moreover, experiments show that this algorithm outperforms other algorithms. In the sampling, the algorithm is run more than 30 independent times with the recommended parameters.
4.2. Analysis of CSWIDN
4.2.1. Neutrality
4.2.2. Ruggedness
4.2.3. Modality
4.2.4. Dynamic Change
4.3. Separability
- For two different contamination source positions and , sampling data are generated with identical distributions of pollutant injection information components and the differences between the two fitness landscapes are analyzed. The two sampled solution sets are defined asFor all the i-th sampled solutions in the two sets, and , the injection information components of the two solutions are the same, .Given that the CSWIDN problem involves multiple contamination sources, the phase is selected for a clearer comparison of the influence of source positions. At this phase, only the third contamination source is discharging pollutants, and the contamination source position components of the two solutions and , are set such that only the third contamination source position differs, , to exclude the effects of the other two contamination sources;
- Under different pollutant injection information and , sampling data are generated with identical distributions of contamination source position components and differences between the two fitness landscapes are analyzed. The two sampled solution sets are defined asFor all the i-th sampled solutions in the two sets, and , The two solutions’ contamination source position components are the same, .
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
CSIWDN | Contaminant Source Identification in Water Distribution Networks |
NBN | Nearest-Better Network |
NBD | Nearest-Better Distance |
References
- Costa, D.M.; Melo, L.F.; Martins, F.G. Localization of Contamination Sources in Drinking Water Distribution Systems: A Method Based on Successive Positive Readings of Sensors. Water Resour. Manag. 2013, 27, 4623–4635. [Google Scholar] [CrossRef]
- Taormina, R.; Galelli, S. Deep-Learning Approach to the Detection and Localization of Cyber-Physical Attacks on Water Distribution Systems. J. Water Resour. Plan. Manag. 2018, 144, 04018065. [Google Scholar] [CrossRef]
- Li, C.; Yang, R.; Zhou, L.; Zeng, S.; Mavrovouniotis, M.; Yang, M.; Yang, S.; Wu, M. Adaptive Multipopulation Evolutionary Algorithm for Contamination Source Identification in Water Distribution Systems. J. Water Resour. Plan. Manag. 2021, 147, 04021014. [Google Scholar] [CrossRef]
- Seth, A.; Klise, K.A.; Siirola, J.D.; Haxton, T.; Laird, C.D. Testing Contamination Source Identification Methods for Water Distribution Networks. J. Water Resour. Plan. Manag. 2016, 142, 04016001. [Google Scholar] [CrossRef]
- Piazza, S.; Sambito, M.; Freni, G. Analysis of Optimal Sensor Placement in Looped Water Distribution Networks Using Different Water Quality Models. Water 2023, 15, 559. [Google Scholar] [CrossRef]
- Yan, X.; Zhao, J.; Hu, C.; Zeng, D. Multimodal optimization problem in contamination source determination of water supply networks. Swarm Evol. Comput. 2019, 47, 66–71. [Google Scholar] [CrossRef]
- Kerschke, P.; Trautmann, H. Automated Algorithm Selection on Continuous Black-Box Problems by Combining Exploratory Landscape Analysis and Machine Learning. Evol. Comput. 2019, 27, 99–127. [Google Scholar] [CrossRef] [PubMed]
- Diao, Y.; Li, C.; Zeng, S.; Yang, S.; Coello, C.A.C. Nearest-Better Network for Fitness Landscape Analysis of Continuous Optimization Problems. IEEE Trans. Evol. Comput. 2024. early access. [Google Scholar] [CrossRef]
- Diao, Y.; Li, C.; Zeng, S.; Yang, S. Nearest Better Network for Visualization of the Fitness Landscape. In Proceedings of the GECCO ’23 Companion: Companion Conference on Genetic and Evolutionary Computation, Lisbon, Portugal, 15–19 July 2023; pp. 815–818. [Google Scholar]
- Stadler, P.F. Fitness landscapes. In Biological Evolution and Statistical Physics; Springer: Berlin/Heidelberg, Germany, 2002; pp. 183–204. [Google Scholar]
- Zou, F.; Chen, D.; Liu, H.; Cao, S.; Ji, X.; Zhang, Y. A survey of fitness landscape analysis for optimization. Neurocomputing 2022, 503, 129–139. [Google Scholar] [CrossRef]
- Malan, K.M.; Engelbrecht, A.P. A survey of techniques for characterising fitness landscapes and some possible ways forward. Inf. Sci. 2013, 241, 148–163. [Google Scholar] [CrossRef]
- Malan, K.M. A Survey of Advances in Landscape Analysis for Optimisation. Algorithms 2021, 14, 40. [Google Scholar] [CrossRef]
- Horn, J.; Goldberg, D.E. Genetic Algorithm Difficulty and the Modality of Fitness Landscapes. In Foundations of Genetic Algorithms; Whitley, L.D., Vose, M.D., Eds.; Elsevier: Amsterdam, The Netherlands, 1995; Volume 3, pp. 243–269. [Google Scholar] [CrossRef]
- Reidys, C.M.; Stadler, P.F. Neutrality in fitness landscapes. Appl. Math. Comput. 2001, 117, 321–350. [Google Scholar] [CrossRef]
- Davidor, Y. Epistasis Variance: A Viewpoint on GA-Hardness. In Foundations of Genetic Algorithms; Rawlins, G.J., Ed.; Elsevier: Amsterdam, The Netherlands, 1991; Volume 1, pp. 23–35. [Google Scholar] [CrossRef]
- Mavrovouniotis, M.; Li, C.; Yang, S. A survey of swarm intelligence for dynamic optimization: Algorithms and applications. Swarm Evol. Comput. 2017, 33, 1–17. [Google Scholar] [CrossRef]
- Liu, L.; Ranjithan, S.R.; Mahinthakumar, G. Contamination Source Identification in Water Distribution Systems Using an Adaptive Dynamic Optimization Procedure. J. Water Resour. Plan. Manag. 2011, 137, 183–192. [Google Scholar] [CrossRef]
- Rasekh, A.; Brumbelow, K. A dynamic simulation–optimization model for adaptive management of urban water distribution system contamination threats. Appl. Soft Comput. 2015, 32, 59–71. [Google Scholar] [CrossRef]
- Yan, X.; Zhao, J.; Hu, C.; Wu, Q. Contaminant source identification in water distribution network based on hybrid encoding. J. Comput. Methods Sci. Eng. 2016, 16, 379–390. [Google Scholar] [CrossRef]
- Gong, J.; Yan, X.; Hu, C.; Wu, Q. Collaborative based pollution sources identification algorithm in water supply sensor networks. Desalin. Water Treat. 2019, 168, 123–135. [Google Scholar] [CrossRef]
- Grbčić, L.; Kranjčević, L.; Družeta, S. Machine learning and simulation-optimization coupling for water distribution network contamination source detection. Sensors 2021, 21, 1157. [Google Scholar] [CrossRef] [PubMed]
- Qian, K.; Jiang, J.; Ding, Y.; Yang, S.H. DLGEA: A deep learning guided evolutionary algorithm for water contamination source identification. Neural Comput. Appl. 2021, 33, 11889–11903. [Google Scholar] [CrossRef]
0.564 | 0.655 | 0.702 | 0.478 | 0.618 |
1 | 2 | 3 | 4 | 5 | 6 | 7 | |
0.394 | 0.361 | 0.357 | 0.477 | 0.310 | 0.312 | 0.419 | |
8 | 9 | 10 | 11 | 12 | 13 | 14 | |
0.411 | 0.406 | 0.402 | 0.277 | 0.312 | 0.311 | 0.311 | |
0.9 | 0.1 | 0.01 | 0.001 | 0.0001 | 0.00001 | 0.000001 | |
0.010 | 0.000 | 0.001 | 0.001 | 0.003 | 0.003 | 0.012 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Diao, Y.; Li, C.; Zeng, S.; Yang, S. Nearest-Better Network-Assisted Fitness Landscape Analysis of Contaminant Source Identification in Water Distribution Network. Data 2024, 9, 142. https://doi.org/10.3390/data9120142
Diao Y, Li C, Zeng S, Yang S. Nearest-Better Network-Assisted Fitness Landscape Analysis of Contaminant Source Identification in Water Distribution Network. Data. 2024; 9(12):142. https://doi.org/10.3390/data9120142
Chicago/Turabian StyleDiao, Yiya, Changhe Li, Sanyou Zeng, and Shengxiang Yang. 2024. "Nearest-Better Network-Assisted Fitness Landscape Analysis of Contaminant Source Identification in Water Distribution Network" Data 9, no. 12: 142. https://doi.org/10.3390/data9120142
APA StyleDiao, Y., Li, C., Zeng, S., & Yang, S. (2024). Nearest-Better Network-Assisted Fitness Landscape Analysis of Contaminant Source Identification in Water Distribution Network. Data, 9(12), 142. https://doi.org/10.3390/data9120142