Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessEditor’s ChoiceArticle

Peer-Review Record

Pipe Fault Prediction for Water Transmission Mains

Water 2020, 12(10), 2861; https://doi.org/10.3390/w12102861

by Ariel Gorenstein¹, Meir Kalech^1,*

, Daniela Fuchs Hanusch²

and Sharon Hassid³

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4: Anonymous

Reviewer 5: Anonymous

Water 2020, 12(10), 2861; https://doi.org/10.3390/w12102861

Submission received: 5 September 2020 / Revised: 5 October 2020 / Accepted: 9 October 2020 / Published: 14 October 2020

(This article belongs to the Section Hydraulics and Hydrodynamics)

Round 1

Reviewer 1 Report

Reviews:

Aiming to accurately predict faults such as bursts, breaks and leakages in waterlines, this paper established a pipe fault prediction model. The segmentation methods and fault prediction algorithms which are particularly linear regression algorithm and random forest regression algorithm were adopted in the study. Results compared the performance between the data-driven algorithms and the rule-based model, which turned out that the data-driven algorithms performed better. However, the paper requires major revisions before being accepted.

In the Abstract and Conclusions sections, the quantitative conclusions in this paper are missing. It is not intuitive for readers to understand the work authors had done in the paper. Please enhance the specific numerical results in these sections.
Highlight the novelty in this study. The literature reviews were well done in the Introduction section but the novelty of this study is not clear. Please enhance the innovation of this study while comparing with other literature applying machine learning methods such as Logistic Regression, GBT, ANN and so on to predict pipe faults.
In Data Description and Processing section, some features which were given or estimated by experts such as Life expectancy, Material score and so on were utilized to the data-driven algorithm. However, opposite the rule-based model in this paper, the data-driven algorithm should be applied to train and gain knowledge from objective pipeline data. Please justify the rationality of the feature selection and revise this section.
In Data Description and Processing section, the selected features are all focus on the pipeline. However, the features about flowing water such as velocity, pressure and acid gas content and features about underground environment such as temperature also contribute a lot to the pipe faults. These features were not taken into account in this paper. Please justify the rationality of the feature selection and revise this section.
Please carefully review the whole context. There are some formatting and spelling mistakes in the paper. For instance, in line 3, page 6, the word lines should be corrected as Lines.
In this paper, the Linear Regression algorithm and Random Forest algorithm are adopted. However, no deeper introduction of how is the process of data input and output during training process and how the prediction achieve was presented. The basic algorithm principle and a calculation flow chart should be enhanced to ensure readers having a better understanding on the prediction model.
In the Fault Prediction Process section, authors split the dataset of past faults to three equal time periods. As a supervised learning model, the features adopted for training were from period A while the labels were from period B. Moreover, the testing process adopted features and labels from period B and period C. It is a little bit confusing since generally the dataset is split in transverse such as 85% of the dataset is for training while the rest 15% is for testing to preserve the data integrity of each pipe. Please justify the rationality of the dataset segmentation.
According to the way of dataset segmentation and feature selection in this paper, it seems that the value of the feature Age is unclear. For instance, for a segment of pipe in 3-year-old during period A values 24 months, whether it means that the segment is 3-year-old at the beginning of period A or at the end of period A? Please justify it.
In the Metrics section, the metric Conover measures the ranking or the prediction fault counts versus that of actual. However, the way of ranking is a little bit confusing. For the actual fault counts vector [1,3,5,6,3,3] which ranks [0,2,4,5,2,2], the prediction fault counts vector [2,4,6,6,9,3] ranks [0,2,2,4,5,2] instead of [0,2,3,3,5,1]. Please justify the rationality of it or revise the ranking principle.
In the Results section, the impact of geographical features is discussed. However, the features adopted to the prediction model exclude the Fault Distance and the number of GIS segments. Please justify the logic between these features and the prediction model.
Please cite the literature as bellow:

[1] Song H, Du S, Wang R, et al. Potential for Vertical Heterogeneity Prediction in Reservoir Basing on Machine Learning Methods[J]. Geofluids, 2020, 2020.

[2] Du S, Wang R, Wei C, et al. The connectivity evaluation among wells in reservoir utilizing machine learning methods[J]. IEEE Access, 2020, 8: 47209-47219.

[3] Zhang L, Wang R, Song H, et al. Numerical Investigation of Techno-Economic Multiobjective Optimization of Geothermal Water Reservoir Development: A Case Study of China[J]. Water, 2019, 11(11): 2323.

[4] Zhang Q, Wei C, Wang Y, et al. Potential for prediction of water saturation distribution in reservoirs utilizing machine learning methods[J]. Energies, 2019, 12(19): 3597.

Author Response

Reviewer 1

R1.1. The reviewer wrote:

In the Abstract and Conclusions sections, the quantitative conclusions in this paper are missing. It is not intuitive for readers to understand the work authors had done in the paper. Please enhance the specific numerical results in these sections.

Our response:

In the new version, we added some of the quantitative results to the abstract, introduction and conclusion.

R1.2. The reviewer wrote:

Highlight the novelty in this study. The literature reviews were well done in the Introduction section but the novelty of this study is not clear. Please enhance the innovation of this study while comparing with other literature applying machine learning methods such as Logistic Regression, GBT, ANN and so on to predict pipe faults.

Our response:

Thanks for this comment. We now added a paragraph in the Introduction that describes the novelty of our approach compared to other research in the field. Therefore we had to restructure the introduction a bit to describe the research gap we identified right before the paragraph where we describe the approach introduced in this paper.

In brief, the novelty of our work is:

Fault prediction models for transmission mains, to the best of our knowledge, so far concentrated on predicting the probability of failure on pipe segments. In our approach, we are predicting the amount of failure that will occur in a specific time period in the future. This allows us to rank pipes for maintenance based on the amount of expected failure and not only on the probability of failure occurrence. Further expected maintenance costs can be calculated in advance, which allows for better decision support whether to repair or replace a pipe segment.
Further, we introduced a novel segmentation process. Little work can be found so far on the appropriate segmentation of long pipelines. Nevertheless, segmentation is essential in large transmission mains networks as faults often are concentrated on a shorter segment of a long pipeline.
We looked at spatial dependability of faults, which e.g. in the review paper of Scheidegger et al. 2015 was pointed out to be a missing issue in pipe failure prediction models [1].

R1.3. The reviewer wrote:

In Data Description and Processing section, some features which were given or estimated by experts such as Life expectancy, Material score and so on were utilized to the data-driven algorithm. However, opposite the rule-based model in this paper, the data-driven algorithm should be applied to train and gain knowledge from objective pipeline data. Please justify the rationality of the feature selection and revise this section.

Our response:

Typically, feature extraction in data-driven approaches includes two kinds of features: (1) raw data features and (2) features extracted based on experts’ knowledge. In this paper, we implemented these two kinds of features. In the revised version we distinguish between these two kinds of features and note for each feature its type.

R1.4. The reviewer wrote:

In Data Description and Processing section, the selected features are all focused on the pipeline. However, the features about flowing water such as velocity, pressure and acid gas content and features about the underground environment such as temperature also contribute a lot to the pipe faults. These features were not taken into account in this paper. Please justify the rationality of the feature selection and revise this section.

Our response:

We agree with the reviewer that this kind of data may be beneficial for fault prediction, however, unfortunately, we did not have this data; we used all the recorded data Mekorot collected. In the revised version we consider this point in a new discussion section.

R1.5. The reviewer wrote:

Please carefully review the whole context. There are some formatting and spelling mistakes in the paper. For instance, in line 3, page 6, the word lines should be corrected as Lines.

Our response:

We went over the paper and fixed all typos, grammar and other minor mistakes.

R1.6. The reviewer wrote:

In this paper, the Linear Regression algorithm and Random Forest algorithm are adopted. However, no deeper introduction of how is the process of data input and output during the training process and how the prediction achieve was presented. The basic algorithm principle and a calculation flow chart should be enhanced to ensure readers having a better understanding on the prediction model.

Our response:

We added an explanation of the process of training and predicting done by those two algorithms in the section that presents the prediction algorithms.

R1.7. The reviewer wrote:

In the Fault Prediction Process section, authors split the dataset of past faults to three equal time periods. As a supervised learning model, the features adopted for training were from period A while the labels were from period B. Moreover, the testing process adopted features and labels from period B and period C. It is a little bit confusing since generally the dataset is split in transverse such as 85% of the dataset is for training while the rest 15% is for testing to preserve the data integrity of each pipe. Please justify the rationality of the dataset segmentation.

Our response:

We had two choices to select the training set: (1) as the reviewer suggested - dividing the segments into two unequal sets: 85% of the segments will use for the training set and 15% for test. (2) as done in the paper – dividing the time into two sets: period A and B used for the training set and period B and C for test. In this paper, we preferred the second choice since when using a training set to learn a fault prediction model, there is an implicit assumption that the training set and the test set are taken from the same distribution. This assumption explains why by learning from the training set (in our case, historical segments of the waterline) we can predict the label of a new segment in the test set. When using the same segments for training and test (but in different time periods) we guarantee this assumption. However, in the first approach, when dividing the segments into two sets, it is more probable that the test set distribution will be slightly different than the training set. We clarify this point in the revised version.

Moreover, we believe that as time goes on, we will be able to collect more data over time and increase the training set significantly and thus improve the prediction model. We show this trend in Section 3.2.2. where we increased the training set to 2/3 of the data.

R1.8. The reviewer wrote:

According to the way of dataset segmentation and feature selection in this paper, it seems that the value of the feature Age is unclear. For instance, for a segment of pipe in 3-year-old during period A values 24 months, whether it means that the segment is 3-year-old at the beginning of period A or at the end of period A? Please justify it.

Our response:

Age represents the time difference in years between January 1, 2020 and the installation date of the segment. We rewrote the definition of the age feature to clarify this ambiguity.

R1.9. The reviewer wrote:

In the Metrics section, the metric Conover measures the ranking or the prediction fault counts versus that of actual. However, the way of ranking is a little bit confusing. For the actual fault counts vector [1,3,5,6,3,3] which ranks [0,2,4,5,2,2], the prediction fault counts vector [2,4,6,6,9,3] ranks [0,2,2,4,5,2] instead of [0,2,3,3,5,1]. Please justify the rationality of it or revise the ranking principle.

Our response:

This is a ranking-based metric, which measures the correlation between the rankings of the actual and predicted fault counts of each segment. First, segments receive a ranking based on their actual fault counts. Afterward, the segments receive a second ranking, by sorting them according to their predictions and assigning the above-mentioned ranks according to the segments' positions in the sorted list. The resulting metric is the average of absolute differences between the two ranks of each segment. This is a well-known ranking metric in the literature*. We clarified in the revised paper the explanation of the ranking principle.

* Iman, R.L.; Conover, W.J. A distribution-free approach to inducing rank correlation among input variables.564Communications in Statistics-Simulation and Computation1982,11, 311–334.

R1.10. The reviewer wrote:

In the Results section, the impact of geographical features is discussed. However, the features adopted to the prediction model exclude the Fault Distance and the number of GIS segments. Please justify the logic between these features and the prediction model.

Our response:

The purpose of the section about the geographical features was to measure how these features affect the data-driven models and determine if they help these algorithms surpass the rule-based model. We accordingly added those features only to the RF Regression algorithm, and not to the Mekorot model, since Mekorot model is based on experts who do not consider these features in their model. We clarified this point in the article.

R1.11. The reviewer wrote:

Please cite the literature as bellow:

[1] Song H, Du S, Wang R, et al. Potential for Vertical Heterogeneity Prediction in Reservoir Basing on Machine Learning Methods[J]. Geofluids, 2020, 2020.

[2] Du S, Wang R, Wei C, et al. The connectivity evaluation among wells in reservoir utilizing machine learning methods[J]. IEEE Access, 2020, 8: 47209-47219.

[4] Zhang Q, Wei C, Wang Y, et al. Potential for prediction of water saturation distribution in reservoirs utilizing machine learning methods[J]. Energies, 2019, 12(19): 3597.

Our response:

Thank you very much for this recommendation for interesting papers. We read them with high interest and tried to link the therein-proposed methods to our field of research. We could see a link between the first and fourth above-mentioned literature and the topic of fault prediction in transmission mains and added them to the literature review. Moreover, we further extended the literature review with additional citations.

Reviewer 2 Report

The paper is interesting and well written.

A moderate check of English language is required.

Moreover a more extensive literature review is required.

Author Response

R2.1. The reviewer wrote:

A moderate check of English language is required.

Our response:

We went over the paper and fixed all typos, grammar and other minor mistakes.

R2.1. The reviewer wrote: R3.1. The reviewer wrote:

Moreover, a more extensive literature review is required.

Our response:

We added a significant amount of additional references.

Reviewer 3 Report

The topic is very interesting from both theoretical and practical points of view.

The manuscript has to be improved before publication in the following:

(a) The introduction includes very few recent references ( just two references from 2019, 2020) and hence ii does not give a clear view about recent advances in the topic

(b) The limitations of the methodology should be clearly defined. For example, from the analysis pipes with no bursts or leakages were discarded. What can you do if you have to include these pipes?

(c) The conclusions are trivial. The comparison of the proposed algorithms with the rule-based decision model should be more detailed and quatitative.

Author Response

R3.1. The reviewer wrote:

The introduction includes very few recent references (just two references from 2019, 2020) and hence it does not give a clear view about recent advances in the topic

Our response:

We added a significant amount of additional references.

R3.2. The reviewer wrote:

The limitations of the methodology should be clearly defined. For example, from the analysis pipes with no bursts or leakages were discarded. What can you do if you have to include these pipes?

Our response:

We added an additional section prior to the conclusions where we discuss the limitations of our methodology. We focused on three key aspects: the lack of features, the focus on bursts or leakages and the imbalance of the dataset. Specifically regarding the point suggested by the reviewer, because predicting faults that result due to reasons other than bursts or leakages was not within the goals of our paper. Bursts and leakages result from natural reasons such as aging of the pipes or bad materials. Other failures, such as those that occur due to the pipes being hit by an external object, are not something that can be predicted, and for that reason, we intentionally filtered out those kinds of faults.

R3.3. The reviewer wrote:

The conclusions are trivial. The comparison of the proposed algorithms with the rule-based decision model should be more detailed and quantitative.

Our response:

Upon review of the paper, we indeed noticed that our conclusions were not detailed enough. We updated the section about the evaluation of the methods to include a quantitative comparison of the data-driven algorithms to the rule-based model. Specifically, we enhanced the discussion of the RMSE, Conover, and Kendall's Tau results to provide more numeric insights. In addition, we also added these results to the abstract and to the conclusions chapter.

Reviewer 4 Report

The manuscript presented for review presents the simulation results using 3 methods that allow for predicting damage to water supply systems: the segmentation methods, the fault prediction methodology and the prediction algorithms. The manuscript is more practical than scientific.

In Chapter 1, Introduction, authors present a review of the literature, its modeling and the model used in the manuscript. 13 of 15 references are cited here (from 1999 to 2020). Reviewer has no comments to this part, except for one that references to literature are in a mixed system. This should be harmonized, e.g. is Kleiner and Rajani 2011, is supposed to be Kleiner and Rajani [4].

In Chapter 2, Methodology, the authors present the Methods: Segmentation Methods, Fault Prediction Process and Prediction Algorithms. There are no references to literature. There is a sub-heading Case study in section 2.2 - this is more of an objective function definition.

Chapter 3 is divided into 2 subsections: Experimental Setup and Results, and these into subsequent sub-chapters. Couldn't be easier? Methodology, Case Study, and Results?

In chapter 4, the authors present their conclusions. Properly but without discussion with an indication of the literature.

Bibliography consists of only 15 items. Yes, there are 2020 items here, but that's not enough. Expand!

Author Response

R4.1. The reviewer wrote:

Our response:

Thanks for the comment – we changed the references to literature accordingly. In addition, in some cases where mentioning the authors in person is necessary for semantics we added the authors name to the text without linking it to the references list.

R4.2. The reviewer wrote:

In Chapter 2, Methodology, the authors present the Methods: Segmentation Methods, Fault Prediction Process and Prediction Algorithms. There are no references to literature.

Our response:

We added references, but the literature discussion is introduced in the introduction in detail.

R4.3. The reviewer wrote:

There is a sub-heading Case study in section 2.2 - this is more of an objective function definition.

Our response:

We changed this heading to Objective description.

R4.4. The reviewer wrote:

Chapter 3 is divided into 2 subsections: Experimental Setup and Results, and these into subsequent sub-chapters. Couldn't be easier? Methodology, Case Study, and Results?

Our response:

The methodology is described in Section 2 so inserting it into section 3 is misleading. We changed the name of the heading of Section 3 to “Evaluation” which describes better the content of this section. This section is divided into two subsections, “Experimental Setup” which describes how we set the experiments and “Results” which describes the results of the experiments. This flow is very common in computer science papers.

R4.4. The reviewer wrote:

In chapter 4, the authors present their conclusions. Properly but without discussion with an indication of the literature.

Our response:

We draw some conclusions with reference to literature and added them to the conclusions chapter.

R4.5. The reviewer wrote:

Bibliography consists of only 15 items. Yes, there are 2020 items here, but that's not enough. Expand!

Our response:

In the new version, we added a significant amount of references and literature review.

Reviewer 5 Report

The paper describes the prediction method of pipe fault location in water main pipeline. The failures interrupt the distribution of water and cause the plenty of waste water. Therefore, the accurate prediction of failure points becomes very useful to the maintenance strategies of water pipelines. The prediction method is composed of Training and Test part at first. The computed results in Test part shows that the prediction is available to make a plan to repair of water transmission pipe segments.

Several additional explains are preferable for more easy understanding of the paper contents.

Author Response

R5.1. The reviewer wrote:

Several additional explains are preferable for more easy understanding of the paper contents.

Our response:

We went over the paper and added more explanations and clarifications in cases we deemed necessary.

Round 2

Reviewer 1 Report

The authors have successfully addressed the manuscrript according to my comments. I recommend to be published.

Reviewer 4 Report

They accept the amendments and allow further proceedings

Article Menu

Pipe Fault Prediction for Water Transmission Mains

Reviewer 1

Further Information

Guidelines

MDPI Initiatives

Follow MDPI