Pipe Fault Prediction for Water Transmission Mains
Abstract
:1. Introduction
2. Methodology
2.1. Segmentation Methods
- Fixed. Dividing each waterline into segments of a fixed length. Specifically, we evaluated with fixed segmentation length of 360 m. This length was determined by the average length of the segments in the next dynamic segmentation method.
- Dynamic. The dynamic segmentation method was proposed by water main experts, being intended to provide a segmentation that is meaningful to water main companies use cases. We describe this segmentation method below.
2.2. Fault Prediction Process
- Training: we use either Fixed or Dynamic segmentation to divide the waterlines to segments. The FaultCount feature of each segment is calculated as the number of faults that occurred within the segment bounds in period A. The other static features of the segments, such as age or material, are also calculated. The label of each segment is the number of faults that happened in that segment during period B. By doing so, the prediction algorithm learns to predict faults while taking into account both static features and fault occurrences in the previous period.
- Test: we use a segmentation algorithm to split waterlines to segments, but in this case we consider only faults from period B. Note that the dynamic segmentation may yield different segmentation for the test set (period B), than the training set (period A). As for the features of the segments in period B, we consider the FaultCount as well as the static features. The labels are the faults that occurred in period C. The trained model gets, as input, the static characteristics and the FaultCount feature of each segment in period B, and predicts the number of faults in period C. Those predictions can be compared to the actual number of faults in period C in order to evaluate the prediction model.
2.3. Prediction Algorithms
- FaultCount—for the training set, this is the number of faults in period A, for the test set this is the number of faults in period B (raw).
- Age—the time difference in years between 1 January 2020 and the installation date of the segment (raw).
- LifeExpectancy—estimation of experts to the lifetime of the segment, according to its material (expert).
- TimeCondition—a binary measure. If the segment is made of concrete and was manufactured after 1965, or if it is made of steel and was manufactured between 1988 and 1995, the score is 1. Otherwise, the score is 0 (raw).
- MaterialScore—score given by experts according to the material of the segment (expert).
- AgeScore—score given according to the ratio between Age and LifeExpectancy (expert).
- LengthScore—score given according to the length of the segment and according to its FaultCount (expert).
- Basic—Linear Regression algorithm using the MaterialScore, AgeScore and LengthScore features.
- Full—Linear Regression algorithm using all the features explained above.
- RF Regression—Random Forest Regression algorithm using all of the features explained above.
3. Evaluation
3.1. Experimental Setup
3.1.1. Data Description and Preprocessing
- Geographic Information System (GIS). Mekorot’s GIS contains geographic data about all the waterlines in Mekorot’s network. Lines are partitioned to sub-lines, where each sub-line is associated with information about its material.
- Systems—Applications—Products in data processing (SAP). Mekorot’s SAP contains (1) operational characteristics of the different water pipes. This includes the age of each water pipe. (2) Water pipe failure reports from June 2016 to December 2019.
- 13 waterlines were removed because they lacked information about their construction type.
- An additional 247 waterlines were removed because the given coordinates of their segments could not be connected to each other.
- 32 were discarded because they lacked information about the pipe they occurred in.
- 1395 were discarded because they lacked information either about the quality of the measurement or about whether it was an burst/leakage.
- 1235 were discarded because they were not bursts or leakages.
- 127 were discarded because of low quality or calculation errors.
- 40 were discarded because they lacked information about their location in the pipes.
- 276 were discarded because they contained an invalid line id.
- 63 were discarded because their location was invalid.
3.1.2. Rule-Based Model
- Construction Type Score: a score that is based on the material of the waterline.
- Workload Score: the ratio of the operational pressure of the waterline divided by its nominal pressure.
- Age Score: the waterline’s age divided by its life expectancy. A waterline’s life expectancy is set according to the type of the waterline. The range of this feature is [0,1].
- Faults to Length Score: the number of leakages and bursts in the segment to be replaced in the last 5 years divided by the length of that segment. The exact way the Length feature is computed is as follows. Let ReplacementLength be the length of the segment to be replaced, and let FaultCount be the number of faults in that segment in the past five years. The LengthRatio is defined as . The Length Score feature is computed as .
- Consumers Score: the population amount at the pipe’s distribution region, the consumption type, whether its agricultural or domestic, and the average flow rate.
- Potential Damage Score: the potential damage fix cost and potential cost of the compensation to the consumers and the land owners.
- Analysis Score: experts’ recommendation.
3.1.3. Metrics
3.1.4. Training and Test Sets
- 2+ faults: we considered only the segments with at least two faults in period A.
- (a)
- For the Fixed segmentation, we simply filtered the segments with at least two faults in period A, and remained with 99 segments.
- (b)
- The Dynamic segmentation ensures by its definition that all of the resulting segments contain at least two faults in period A. In total he had 102 segments in this segmentation method.
- ALL segments: using all of the waterline data for the training set:
- (a)
- In the fixed segmentation we considered all the segments (22,516 segments).
- (b)
- For the dynamic segmentation: we extended the segmentation to also include segments with less than two faults. To do this, we generated a segment from all groups of faults that occurred within 300 m of each other: even groups that only contained a single fault. Afterwards, for the remaining parts of waterlines that still did not belong to any segment, we divided them to fixed size segments of 293 m each, which was the average length of the rest of the segments. In total, we had 29,620 segments.
- 2+ faults: we only filtered the segments with at least two faults in period B. In total, we had:
- (a)
- 99 segments in the fixed segmentation.
- (b)
- and 104 in the dynamic segmentation.
3.2. Results
3.2.1. Methods Evaluation
- In all configurations at least one regression-based model outperforms Mekorot, improving its RMSE resulting by at least 5%, the Conover result by at least 4%, and the Kendall’s Tau result by at least 13%.
- In the Dynamic segmentation, Full and RF Regression have better results than Mekorot, but it seems that mostly RF Regression performs the best, with an improvement of up to 8.7& in the RMSE metric, up to 14.3% in Conover, and up to 31.8% improvement in the Kendall’s Tau metric.
- Full and RF Regression should be trained with all data, while Basic should be trained while using only segments with 2+ faults.
3.2.2. The Impact of the Training Set Size
- We split our data of past faults into four periods of 10 months each, unlike the usual division to three periods of 13 months. By doing so, we were able to use two periods, A and B, each one with 10 months, as the training set and thereby increasing the training set size from 29,943 segments to 60,508 segments. Afterwards we inserted the faults from period C into the trained prediction model to predict faults in period D, and finally compare them to the actual amount of faults in that period (20 months).
- We split the data into four periods, as in the previous experiment, but we trained the model with a total of 30 months: the 20 months from the last experiment, and an additional 10 that we obtained by overlapping periods A and B: we took the last five months from period A and the first five months from period B. In total, the training set consisted of 90,451 segments. Subsequently, we tested the model using periods C and D, as described above (30 months).
3.2.3. The Impact of Geographical Features
- The number of faults in the entire waterline in which the segment belongs to (Fault Count).
- The Euclidean distance between the segment to the closest fault, as well as the number of fault occurrences in a certain radius from the segment (Fault Distance).
- The number of GIS subsegments that exist in the segment (GIS segments).
4. Discussion
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Scheidegger, A.; Leitão, J.; Scholten, L. Statistical failure models for water distribution pipes—A review from a unified perspective. Water Res. 2015, 83, 237–247. [Google Scholar] [CrossRef] [PubMed]
- Rizzo, P. Water and Wastewater Pipe Nondestructive Evaluation and Health Monitoring: A Review. Adv. Civ. Eng. 2010, 2010, 818597. [Google Scholar] [CrossRef] [Green Version]
- Liu, Z.; Kleiner, Y.; Rajani, B.; Wang, W. Condition Assessment Technologies for Water Transmission and Distribution Systems; Tech. Rep.; U.S. Environmental Protection Agency: Washington, DC, USA, 2012. [Google Scholar]
- Kleiner, Y.; Rajani, B. Comprehensive review of structural deterioration of water mains: Statistical models. Urban Water 2001, 3, 131–150. [Google Scholar] [CrossRef] [Green Version]
- Friedl, F.; Möderl, M.; Rauch, W.; Schrotter, S.; Liu, Q.; Fuchs-Hanusch, D. Failure Propagation for Large-Diameter Transmission Water Mains Using Dynamic Failure Risk Index; World Environmental and Water Resources Congress: Milwaukee, WI, USA; American Society of Civil Engineers (ASCE): Reston, VA, USA, 2012; pp. 3082–3095. [Google Scholar]
- Kleiner, Y.; Rajani, B. Comparison of four models to rank failure likelihood of individual pipes. J. Hydroinform. 2011, 14, 659–681. [Google Scholar] [CrossRef] [Green Version]
- Eisenbeis, P.; Rostum, J.; Le Gat, Y. Statistical models for assessing the technical state of water networks: Some european experiences. In Proceedings of the Annual Conference American Water Works Association, Chicago, IL, USA, 20–24 June 1999; p. 13. [Google Scholar]
- Røstum, J. Statistical Modelling of Pipe Failures in Water Networks. Ph.D. Thesis, University of Science and Technology, Trondheim, Norway, 2000. [Google Scholar]
- Xu, Q.; Chen, Q.; Li, W.; Ma, J. Pipe break prediction based on evolutionary data-driven methods with brief recorded data. Reliab. Eng. Syst. Saf. 2011, 96, 942–948. [Google Scholar] [CrossRef]
- Song, H.; Du, S.; Wang, R.; Wang, J.; Wang, Y.; Wei, C.; Liu, Q. Potential for Vertical Heterogeneity Prediction in Reservoir Basing on Machine Learning Methods. Geofluids 2020, 2020, 3713525. [Google Scholar] [CrossRef]
- Zhang, Q.; Wei, C.; Wang, Y.; Du, S.; Zhou, Y.; Song, H. Potential for prediction of water saturation distribution in reservoirs utilizing machine learning methods. Energies 2019, 12, 3597. [Google Scholar] [CrossRef] [Green Version]
- Wang, R.; Dong, W.; Wang, Y.; Tang, K.; Yao, X. Pipe failure prediction: A data mining method. In Proceedings of the 2013 IEEE 29th International Conference on Data Engineering (ICDE), Brisbane, Australia, 8–11 April 2013; pp. 1208–1218. [Google Scholar] [CrossRef]
- Giraldo-González, M.M.; Rodríguez, J.P. Comparison of Statistical and Machine Learning Models for Pipe Failure Modeling in Water Distribution Networks. Water 2020, 12, 1153. [Google Scholar] [CrossRef] [Green Version]
- Berardi, L.; Giustolisi, O.; Kapelan, Z.; Savic, D.A. Development of pipe deterioration models for water distribution systems using EPR. J. Hydroinform. 2008, 10, 113. [Google Scholar] [CrossRef] [Green Version]
- Snider, B.; McBean, E.A. Improving Urban Water Security through Pipe-Break Prediction Models: Machine Learning or Survival Analysis. J. Environ. Eng. 2020, 146, 04019129. [Google Scholar] [CrossRef]
- Alizadeh, Z.; Yazdi, J.; Mohammadiun, S.; Hewage, K.; Sadiq, R. Evaluation of data driven models for pipe burst prediction in urban water distribution systems. Urban Water J. 2019, 16, 136–145. [Google Scholar] [CrossRef]
- Chen, T.J.; Guikema, S. Prediction of water main failures with the spatial clustering of breaks. Reliab. Eng. Syst. Saf. 2020, 203. [Google Scholar] [CrossRef]
- Scholten, L.; Scheidegger, A.; Reichert, P.; Mauer, M.; Lienert, J. Strategic rehabilitation planning of piped water networks using multi-criteria decision analysis. Water Res. 2014, 49, 124–143. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Poulton, M.; Le Gat, Y.; Brémond, B. The impact of pipe segment length on break predictions in water distribution systems. Strategic Asset Management of Water Supply and Wastewater Infrastructures; IWA Publishing: London, UK, 2009; p. 419. [Google Scholar]
- Jiang, Y.; Cukic, B.; Ma, Y. Techniques for evaluating fault prediction models. Empir. Softw. Eng. 2008, 13, 561–595. [Google Scholar] [CrossRef]
- Schwabacher, M. A survey of data-driven prognostics. In Infotech@ Aerospace; American Institute of Aeronautics and Astronautics: Arlington, VA, USA, 2005; p. 7002. [Google Scholar]
- Salfner, F.; Lenk, M.; Malek, M. A survey of online failure prediction methods. ACM Comput. Surv. (CSUR) 2010, 42, 1–42. [Google Scholar] [CrossRef]
- Baldi, P.; Brunak, S.; Chauvin, Y.; Andersen, C.A.; Nielsen, H. Assessing the accuracy of prediction algorithms for classification: An overview. Bioinformatics 2000, 16, 412–424. [Google Scholar] [CrossRef] [Green Version]
- Shi, F. Data-Driven Predictive Analytics for Water Infrastructure Condition Assessment and Management. Ph.D. Thesis, University of British Columbia, Vancouver, BC, Canada, 2018. [Google Scholar]
- Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2001, 2, 18–22. [Google Scholar]
- Iman, R.L.; Conover, W.J. A distribution-free approach to inducing rank correlation among input variables. Commun. Stat. Simul. Comput. 1982, 11, 311–334. [Google Scholar] [CrossRef]
- Maurice, K. A new measure of rank correlation. Biometrika 1938, 30, 81–89. [Google Scholar]
- Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Tahir, M.A.; Kittler, J.; Yan, F. Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recognit. 2012, 45, 3738–3750. [Google Scholar] [CrossRef]
Trainingset | Test Set | ||||
---|---|---|---|---|---|
2+ | ALL | 2+ | |||
Fixed | Dynamic | Fixed | Dynamic | Fixed | Dynamic |
99 | 102 | 22,516 | 29,620 | 99 | 104 |
Algorithm | Training Set | Size | RMSE | Conover | Kendall’s Tau | TOP 10 |
---|---|---|---|---|---|---|
10 months | 29943 | 0.91 | 13.32 | 0.11 | 5 | |
RF | 20 months | 60508 | 0.79 | 13.35 | 0.19 | 5 |
30 months | 90451 | 0.82 | 12.61 | 0.16 | 5 | |
10 months | 29943 | 0.83 | 15.1 | 0.15 | 6 | |
Mekorot | 20 months | 60508 | 0.83 | 15.1 | 0.15 | 6 |
30 months | 90451 | 0.83 | 15.1 | 0.15 | 6 |
Algorithm | Setting | RMSE | Conover | Kendall’s Tau | TOP 10 |
---|---|---|---|---|---|
RF | Normal | 0.83 | 17.38 | 0.29 | 3 |
Fault Count | 0.84 | 18.88 | 0.26 | 4 | |
Fault Distance | 0.86 | 21.67 | 0.15 | 2 | |
GIS segments | 0.83 | 18.7 | 0.26 | 2 | |
All | 0.84 | 19.23 | 0.24 | 3 | |
Mekorot | 0.91 | 20.98 | 0.22 | 3 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gorenstein, A.; Kalech, M.; Hanusch, D.F.; Hassid, S. Pipe Fault Prediction for Water Transmission Mains. Water 2020, 12, 2861. https://doi.org/10.3390/w12102861
Gorenstein A, Kalech M, Hanusch DF, Hassid S. Pipe Fault Prediction for Water Transmission Mains. Water. 2020; 12(10):2861. https://doi.org/10.3390/w12102861
Chicago/Turabian StyleGorenstein, Ariel, Meir Kalech, Daniela Fuchs Hanusch, and Sharon Hassid. 2020. "Pipe Fault Prediction for Water Transmission Mains" Water 12, no. 10: 2861. https://doi.org/10.3390/w12102861
APA StyleGorenstein, A., Kalech, M., Hanusch, D. F., & Hassid, S. (2020). Pipe Fault Prediction for Water Transmission Mains. Water, 12(10), 2861. https://doi.org/10.3390/w12102861