Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A New Approach to Machine Learning Model Development for Prediction of Concrete Fatigue Life under Uniaxial Compression

Appl. Sci. 2022, 12(19), 9766; https://doi.org/10.3390/app12199766

by Jaeho Son and Sungchul Yang^*

Reviewer 1: Anonymous

Reviewer 2:

Paula Villanueva-Llauradó

Reviewer 3:

Jun Huang

Reviewer 4:

Mahzad Esmaeili-Falak

Appl. Sci. 2022, 12(19), 9766; https://doi.org/10.3390/app12199766

Submission received: 27 August 2022 / Revised: 23 September 2022 / Accepted: 24 September 2022 / Published: 28 September 2022

(This article belongs to the Special Issue Fatigue, Performance, and Damage Assessment of Concrete)

Round 1

Reviewer 1 Report

The manuscript deals with some machine learning models to predict the fatigue life of plain concrete specimens under cyclic compression. Experimental results presented in (Zhang et al., 2021) are processed and used to train the adopted machine learning models. Conclusions are drawn on the basis of the prediction capabilities of the ML models on the fatigue life.

The topic is interesting and the paper is fairly well written. The manuscript can be accepted for publication provided that the following remarks are considered by the Authors in the revised version of the paper:

1. The Authors should clearly state the novelty aspects of their paper in relation to that of Zhang et al., 2021;

2. Shape of specimens is obviously identified by the Authors using a number flag. However, presenting mean, median, etc. results (e.g. see Table 1) for shape flag seems to be meaningless. Please clarify;

3. Section 5: the link to Orange Software (https://orangedatamining.com/widget-cat-alog/) is not working;

4. The labels and figures of the graphs in Fig. 12 are not readable. Please improve the readability of this figure and in general check that of all the figures in the manuscript.

Author Response

Response to Reviewer 1:

Our revised paper was rechecked by a professional English editing service group as follows.

1) The Authors should clearly state the novelty aspects of their paper in relation to that of Zhang et al., 2021;

→ ) As the reviewer points out as “The Authors should clearly state the novelty aspects of their paper in relation to that of Zhang et al., 2021.”

Due to the nature of the fatigue strength test, outliers can commonly occur. In statistics, an outlier is a data point that differs significantly from other observations [63-64]. An outlier may be due to variability in the measurement or it may indicate experimental error; the latter is sometimes excluded from the data set. There are various methods of outlier detection such as Grubbs’s test [63], Chauvenet’s criterion [65], Peirce’s criterion [66], Dixon’s Q-test [67], the Generalized extreme studentized deviation test [68], Thompson and Tau test [69], and IQR-test [70-71]. In this study, Chauvenet’s criterion, Pierce’s criterion, Thompson-Tau criterion and IQR method were adopted to remove outliers.

Due to the nature of the fatigue strength test, outliers can remarkably occur in this test compared to other material strength properties tests. In statistics, an outlier is a data point that differs significantly from other observations [63-64]. An outlier may be due to variability in the measurement or it may indicate experimental error; the latter is sometimes excluded from the data set. There are various methods of outlier detection such as Grubbs’s test [63], Chauvenet’s criterion [65], Peirce’s criterion [66], Dixon’s Q-test [67], the Generalized extreme studentized deviation test [68], Thompson and Tau test [69], and the IQR-test [70-71].

In this study, 1300 samples of experimental data [5-33] of concrete fatigue tests originally carried out by Zhang et al. [4] were treated using four kinds of machine learning models (Artificial Neural Network, Random Forest, and the Gradient Boosting and AdaBoost method). Unlike previous studies, this research adopts six independent values, excluding only the sustained strength of concrete variable used from Zhang et al. [4]. For our approach, three data files were generated to compare the actual number of fatigue repetition values (logN) against the predicted values (logN). The first data uses the entire original dataset, which was treated by Zhang et al. [4]. However, unlike Zhang et al. [4], our research adds the second data with the grouping data and the third data which excludes outliers. In this work, Chauvenet’s criterion, Pierce’s criterion, Thompson-Tau criterion and the IQR method were adopted to remove outliers. Finally, a Permutation Feature Importance (PFI) analysis was carried out to determine which input variables are most critical or minor on the fatigue life model. Our novel approach allows better fatigue life prediction than Zhang et al. [4]’s approach.

b)And for our novelty aspects, in the last paragraph, we added some words such as “Unlike previous studies,” “For our approach,” “unlike Zhang et al. [4],” “Our novel approach allows better fatigue life prediction than Zhang et al. [4]’s approach.”

2) Shape of specimens is obviously identified by the Authors using a number flag. However, presenting mean, median, etc. results (e.g. see Table 1) for shape flag seems to be meaningless. Please clarify;

→ As the reviewer points out, Shape is a categorical feature (variable) rather than a numerical feature (variable). Therefore, we will add notes to the table to express that shape is a categorical variable.

As a footnote of Table 1, the following was added, “Since shape is a categorical variable, the statistical features expressed in the table may not be meaningful.”

Section 5: the link to Orange Software (https://orangedatamining.com/widget-catalog/) is not working;

→ As pointed out, the link to Orange Software '(https://orangedatamining.com/widget-cat-alog/)' was changed to '(https://orangedatamining.com/widget-catalog/)'. I guess this happens during the editorial preparation process in the MDPI, where at the end of line, the hyphen '-' is used to connect a single word.

The labels and figures of the graphs in Fig. 12 are not readable. Please improve the readability of this figure and in general check that of all the figures in the manuscript.

→ As the reviewer points out, the readability of labels and graphs in Fig. 12 as well as in Figs. 1 & 2 was improved.

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper overall quality is adequate for the journal, and the research may be of interest to the scientific community. However, in the reviewer's opinion there are several aspects that must be checked or amended for the paper to be considered for publication. The most important aspects are the introduction, which should be rewritten in some parts to provide a better overall understanding of the studied phenomena (concrete under cyclic loading), and the results, in which the best results are found for average data excluding outliers but the comparison between methods with grouped and ungrouped data is confusing. In the following lines, these aspects together with some formal remarks are collected.

ABSTRACT.

Line 15. Please check spelling of "Excluding".

Line 23. Please check spelling of "Analysis", and check grammar of the sentence.

Line 25. Please modify "height and width ratio" to "height to width ratio" or similar, to prevent misreading.

Line 26. "A very weak" is a rather vague expression for an abstract. Please consider rewording to "a negligible influence", " a non-significant influence", or similar.

INTRODUCTION.

The first paragraph of the introduction is quite confusing. It does not clearly state whether there would be a benefit of cyclic loading (higher maximum compressive load, but reduction in durability and fatigue failure?). Please explain further these points, and the relation with the amplitude of the cyclic loading, according to literature. This is partially explained later in the text, so please consider reorganizing ideas, as great differences exist among the cases presented in the first paragraph (live load in buildings, traffic loads, etc.).

Line 70, please check grammar.

3. DATA PREPARATION.

Line 165. Please check sentence

6. RESULTS AND DISCUSSION

6.3. Should it be "model developed with grouped (or Average, as in the conclusion section) Data excluding outliers?" In this section, it should e clarified whether the MSE, RMSE, MAE and R2 are compared to the average values of data or with the global data, and explain the consequences. The same appears for Figure 12, it is not clear for the reviewer whether the observed data in the graphs are average or discretional values. It scatter is relatively high and this model does not take it into account, this may lead to better results than the actual comparison with experimental values.

Author Response

Response to Reviewer 2:

Abstracts

Line 15. Please check spelling of "Excluding".

→ As pointed out, 'Excluding' was changed from 'Exculding.'

2) Line 23. Please check spelling of "Analysis", and check grammar of the sentence.

→ As pointed out, 'Analysis' was changed from 'Anaysis.' And the sentence was changed as follows.

"Analysis results showed that the maximum stress level (S_max) and loading frequency (f) are were very important the most significant input variables, followed by compressive strength (f‘_c) and maximum to minimum stress ratio (R)."

3) Line 25. Please modify "height and width ratio" to "height to width ratio" or similar, to prevent misreading.

→ As pointed out, 'height to width ratio' was changed from 'height and width ratio.'

4) Line 26. "A very weak" is a rather vague expression for an abstract. Please consider rewording to "a negligible influence", "a non-significant influence", or similar.

→ As pointed out, 'non-significant’ was changed from 'very weak.'

Introduction

5) The first paragraph of the introduction is quite confusing. It does not clearly state whether there would be a benefit of cyclic loading (higher maximum compressive load, but reduction in durability and fatigue failure?). Please explain further these points,

→ As pointed out, the second sentence was wrongly described. The sentence was modified as follows.

"It is commonly known that concrete strength has a higher maximum compressive load under repeated loading will be lower than that under static loading [1-2]. "

6) Line 70, please check grammar.

→As suggested, the following sentence, "Basic properties of concrete that have been predicted using the ML method include the compressive strength of normal concrete [36-41], high performance concrete (HPC) [38,42-44], industrial wastes including supplementary cementitious materials (SCMs) [45-54], recycled aggregate (RA) [52,55-57], geopolymer concrete [58-59], and fibers [53]. " was modified to "Main concrete property that has been predicted using the ML method is the compressive strength of various concretes such as normal concrete [36-41], high performance concrete (HPC) [38,42-44], concrete with industrial wastes including supplementary cementitious materials (SCMs) [45-54], recycled aggregate (RA) concrete [52,55-57], geopolymer concrete [58-59], and concrete with fibers [53]. "

DATA PREPARATION.

7) Line 165. Please check sentence

→According to the reviewer's request, the sentence was modified as follows.

"As a result of applying the four outlier detection methodologies for the data in Group 1, N value of 22,570 (see Table 4) was detected as an outlier by all four methods and finally selected as an outlier. We performed all four of these methodologies on each group of data to determine which values were detected as outliers. All four of these outlier detection methodologies detected the N value of 22,570 (see Table 4) as an outlier for the Group 1 data. On the other hand, in for the data in Group 2, the N value of 1,571 (see Table 4) was detected as an outlier only by the Thompson-Tau methodology, but was not detected as an outlier by the other three methodologies."

RESULTS AND DISCUSSION

8) 6.3. Should it be "model developed with grouped (or Average, as in the conclusion section) Data excluding outliers?"

→As requested, the section title was changed to "Model developed with Average Data excluding outliers." In addition, for clarity, we rearranged Section 3. Data preparation from 4 to 3 procedures.

"1.ORIGINAL DATA: Data used in Zhang's paper, directly collected by the authors of from papers [5] - [33]. This is used as the reference data for this study. A total of 1,298 data were collected, and The statistical features such as the mean, median, dispersion, minimum, and maximum values of this data independent and dependent variables are summarized in Table 1. The ORIGINAL DATA was grouped by the same input variable value.

2.GROUPED DATA: Data obtained by grouping data with the same input variable value and different output variable logN value. This data serves as a basis for selecting and eliminating outliers.

3.2. DATA Excluding OUTLIERS: If there are outliers in the group, this is data created after removing them. This data is used as a basis for determining the average value after removing the outliers. A total of 1,252 data were generated. The Sstatistical features such as the mean, median, dispersion, minimum and maximum values of independent and dependent variables values of this data are summarized in Table 2.

4.3. AVERAGE DATA Excluding OUTLIERS: This is the data created by averaging the grouped data after excluding outliers from among the grouped data. In this process, the total number of data was reduced to 310. The Sstatistical features such as the mean, median, dispersion, minimum and maximum values of independent and dependent variables values of this data are summarized in Table 3."

9) In this section, it should be clarified whether the MSE, RMSE, MAE and R2 are compared to the average values of data or with the global data, and explain the consequences.

→According to the reviewer's request, the sentence was modified.

"Compared to the analysis results shown based on ML models with data excluding outliers (Table 9), predicted values were better correlated with actual values in the Gradient Boosting, Random Forest, and AdaBoost models, but not in the Neural Network model. Three sets of data were used to develop the ML models in this study. The MSE, RMSE, MAE, and R² calculated with the average data excluding outliers were compared to the MSE, RMSE, MAE, and R² calculated with both the original data and the grouped data excluding outliers. As a result of comparing the values in Tables 8, 9, and 10, the ML model developed with average data excluding outliers most closely matched the predicted value and the observed value."

10) The same appears for Figure 12, it is not clear for the reviewer whether the observed data in the graphs are average or discretional values. Its scatter is relatively high and this model does not take it into account,

this may lead to better results than the actual comparison with experimental values.

→According to the reviewer's request, the paragraph was modified.

"Figure 12 depicts actual values against the predicted values of logN for machine learning models developed with the average data excluding outliers. The results of the Gradient Boosting model are better ﬁt to a straight line than other ML models, which indicates that the Gradient Boosting model is more accurate for predicting the logN. The scattered data of the Gradient Boosting model is closer to the linear regression line than the scattered data of other models. Compared to the other models, the scatter plot of the Neural Network model does not fit well and its prediction is slightly off, which has a larger dispersity of scatter points. Among 4 ML models developed with the average data excluding outliers Gradient Boosting Model most closely fits the observed data."

Author Response File: Author Response.pdf

Reviewer 3 Report

With the machine learning models (the random forest, neural network, gradient boosting and adaboost), the authors forecasted the fatigue life of plain concrete under uniaxial compression and got many results from the data excluding outliers. However, there are some problems that need to be checked.

1. Compared with the reference [4], the authors selected the same experimental data from 1300 samples, the reference [4] also discussed the residual strength of concrete under the fatigue loading with the machine models such as random forest and neural network. The authors should give the difference between the current study and the reference [4].

2. The machine learning models were used to analyze the fatigue life of concrete for many years, the authors can select several references to compare with the current study.

3. In page 1, the line 34-35, “concrete has a higher maximum compressive load under repeated loading than under static loading”, why?

4. In page 7, the line 208, “shape has a weak positive”, however, in table 7, according to the Spearman correlation coefficient, the shape value is negative (-0.020), please check it.

5. In page 18, the references [84] and [85] can not be opened.

Author Response

Response to Reviewer 3:

1.Compared with the reference [4], the authors selected the same experimental data from 1300 samples, the reference [4] also discussed the residual strength of concrete under the fatigue loading with the machine models such as random forest and neural network. The authors should give the difference between the current study and the reference [4].

→As pointed out, for our novelty aspects compared with the reference [4], we stressed out difference using some words such as “Unlike previous studies,” “For our approach,” “unlike Zhang et al. [4],” “Our novel approach allows better fatigue life prediction than Zhang et al. [4]’s approach.” The paragraph was modified as follows. It should be noted that we rearranged two paragraphs by bringing forward the last paragraph in the opposite way.

"In this study, 1300 samples of experimental data [5-33] of concrete fatigue tests originally carried out by Zhang et al. [4] were treated using four kinds of machine learning models (Artificial Neural Network, Random Forest, and the Gradient Boosting and AdaBoost method). Unlike previous studies, this research adopts six independent values, excluding only the sustained strength of concrete variable used from Zhang et al. [4]. For our approach, three data files were generated to compare the actual number of fatigue repetition values (logN) against the predicted values (logN). The first data uses the entire original dataset, which was treated by Zhang et al. [4]. However, unlike Zhang et al. [4], our research adds the second data with the grouping data and the third data which excludes outliers. In this work, Chauvenet’s criterion, Pierce’s criterion, Thompson-Tau criterion and the IQR method were adopted to remove outliers. Finally, a Permutation Feature Importance (PFI) analysis was carried out to determine which input variables are most critical or minor on the fatigue life model. Our novel approach allows better fatigue life prediction than Zhang et al. [4]’s approach."

The machine learning models were used to analyze the fatigue life of concrete for many years, the authors can select several references to compare with the current study.

We found out some references related to "concrete fatigue + Machine Learning" as follows.

A novel study for the estimation of crack propagation in concrete using machine learning algorithms, Construction and Building Materials, 2019. → This paper deals with crack propagation detection.
Fast fatigue method for self-compacting recycled aggregate concrete characterization, Journal of Cleaner Production, 2020. → This paper does not cover the Machine Learning.
Probabilistic machine learning approach to bridge fatigue failure analysis due to vehicular overloading, Engineering Structures, 2019. → This paper deals with fatigue failure but for the composite bridge structure made with concrete deck + steel girder.
Early damage detection of fatigue failure for RC deck slabs under wheel load moving test using image analysis with artificial intelligence, Engineering Structures, 2021. → This paper deals with crack propagation detection.
Remaining fatigue life assessment of in-service road bridge decks based upon artificial neural networks, Engineering Structures, 2018. → This paper deals with fatigue life detection for RC deck.
Multiaxial Fatigue Life Assessment of Integral Concrete Bridge with a Real-Scale and Complicated Geometry Due to the Simultaneous Effects of Temperature Variations and Sea Waves Clash, Materials, 2021. → This paper deals with fatigue life of steel piles on the integrated concrete bridge

However, we found out a Materials Journal paper titled as "ANN-Based Fatigue Strength of Concrete under Compression" by M. Abambres and E. Lantsoght, 2019. Thus, the followings were added in the text.

"In 2019, ANN-based concrete fatigue strength model was proposed by Abambres and Lantsoght [63]. They used 203 data points gathered from the literature. Predicted values analyzed from the ANN model were compared to the existing code expressions. Their ANN model includes the compressive strength of concrete, maximum stress level, and minimum stress level."

In page 1, the line 34-35, “concrete has a higher maximum compressive load under repeated loading than under static loading”, why?

→ As pointed out, the second sentence was wrongly described. The sentence was modified as follows.

"It is commonly known that concrete strength has a higher maximum compressive load under repeated loading will be lower than that under static loading [1-2]. "

In page 7, the line 208, “shape has a weak positive”, however, in table 7, according to the Spearman correlation coefficient, the shape value is negative (-0.020), please check it.

→ As pointed out, the sentences were modified as follows.

"According to the Pearson correlation coefficient, S_max and logN have a solid negative negative solid linear relationship, while f has a positive and, R, and shape seem to have has a negative moderate some linear relationship with logN. f‘_c , shape, and h/w have a non-significant linear relationship with logN. According to the Spearman correlation coefficient, f has a solid positive; S_max has a solid negative; S_max has a negative and f has a positive significant; shape has a weak positive; and f‘_c has a weak negative f‘_chas a negative moderate; R, shape, and h/w have a negligible monotonic relationship with logN."

In page 18, the references [84] and [85] can not be opened.

→ As pointed out, the references [85] and [86] was changed as follows. Please note that a new reference [63] was added. I guess this happens during the editorial preparation process in the MDPI, where at the end of line, the hyphen '-' is used to connect a single word. Authors will let the editorial office recognize it. And for ref. [85], 'ko-kr' (Korean version) was changed to 'en-us' (English version).

[84]https://support.minitab.com/ko-kr/minitab/18/help-and-how-to/statistics/basic-statistics/supporting-topics/correlation-and-covariance/a-comparison-of-the-pearson-and-spearman-correlation-methods/

[85]https://support.minitab.com/en-us/minitab/18/help-and-how-to/statistics/basic-statistics/supporting-topics/correlation-and-covariance/a-comparison-of-the-pearson-and-spearman-correlation-methods/

[86]https://towardsdatascience.com/clearly-explained-pearson-v-s-spearman-correlation-coefficient-ada2f473b8

Author Response File: Author Response.pdf

Reviewer 4 Report

Please find the attached file.

Comments for author File: Comments.pdf

Author Response

1.Due to the use of four approaches for prediction, no optimizer has been used. Without using the optimizer and considering the large number of models, how did the authors make sure of tuning the hyperparameters of each approach?!

→ Unfortunately, the Orange 3 software used for this study does not have an optimizer function that automatically finds the hyper-parameters of the model. Thus, starting with the default parameters provided by Orange 3, authors manually adjust the parameters to generate feasible output of each ML model. In the next study, we will use the program provided with the optimizer to further improve the accuracy of the model. Thus, according to your comments, authors modified the manuscript by adding two sentences as follows.

"The schematic model developed using Orange Software is presented in Fig. 7, and the specific parameters of each proposed model are shown in Figs. 8-11. Unfortunately, the Orange 3 software used for this study does not have an optimizer function that automatically finds the hyper-parameters of the model. Thus, starting with the default parameters provided by Orange 3, authors manually adjust the parameters to generate feasible output of each ML model."

It can almost be mention that without using the optimizer, it is impossible to ensure the correctness of the hyperparameters value. The authors should justify this critical concern reasonably.

→ Authors answered this question in the Comment 1.

3.The following papers can be considered in writing the main text (introduction, materials and methods) of the manuscript. The presentation of the manuscript is so weak.

→We appreciate your suggestion. By the help of those references and other comments, we modified 1. Introduction, 2. Input and Out Data, 3. Data Preparation for the Developed Model. Especially, meaning of symbols for the independent variables and dependent variables was further explained in Section 2 as follows.

"The dataset used in this study is summarized below."

f‘_c : the compressive strength of concrete by MPa

h/w: height to width ratio of the tested specimens

Shape: shape of the test specimens

S_max: maximum stress level

R: minimum stress to maximum stress ratio

f (Hz): loading frequency by Hz

LogN: logarithm number of cycles to failure of the specimen

The interpretation of the collected data has not been done in an acceptable approach.

→ According to the reviewer’s opinion, the sections of the collected data in the manuscript are modified as follows. Additionally, to help the reader understand, Section 3. Data Preparation for the Developed Model process was rearranged from 4 procedures to 3 procedures.

2.GROUPED DATA: Data obtained by grouping data with the same input variable value and different output variable logN value. This data serves as a basis for selecting and eliminating outliers.

The statistical analysis of the data is incomplete.

→ As requested, the manuscript has been revised in page 8, as follows to reflect your comments.

The data used are discrete and may lead to model divergence. How has this been investigated?

→ As the reviewer points out, if the input variables are discrete in a machine learning model, the model may diverge due to insufficient diversity in the input data. It is best to provide a convergence curve that shows that the model does not diverge. However, the Orange 3 software used for this study does not generate Convergence Curve for each ML model. Therefore, it was investigated that the models do not diverge because the training and test data results of the ML models show plausible R²values.

The optimal hyperparameters of each model should be presented and explained.

→ Manually adjusted parameters of each proposed model are shown in Figs. 8-11. This question was answered in Comments 1. To reflect the reviewer’s comment, the explanation of the adjusted parameters is added in the manuscript as follows.

"In order to develop a ANN network model, the user has to set several important parameters which are as follows. The number of hidden layers is set to 2, and there are 7 and 8 neurons in each hidden layer, as shown in Fig.8. The rectified linear unit function is selected as the activation function for hidden layer. As a solver for weight optimization, a stochastic gradient-based optimizer called Adam is used. As a regularization parameter, commonly called alpha, 0.0004 is used. Replicable training is allowed."

Figure 8. Parameters of the proposed ANNs Model [7879].

"In order to develop a Random Forest model, the user has to set several important parameters which are as follows. As shown in Fig.9, 50 decision trees will be included in the forest. 4 attributes will be arbitrarily drawn for consideration at each node. Replicable training was permitted, while balance class distribution was not. The limit depth of individual trees has not been determined. Select 5 subset as the smallest subset that can be split."

Figure 9. Parameters of the proposed Random Forest Model [7879].

"In order to develop a Gradient Boosting model, the user has to set several important parameters which are as follows. As shown in Fig.10, 150 gradient boosted trees are specified. A larger number usually results in better performance. The boosting rate is set to 0.2. Replicable training is allowed. The maximum depth of the individual tree is set to 4. Select 3 subset as the smallest subset that can be split. Fraction of training instances is set to 1. It Specify the percentage of the training instances for fitting the individual tree."

Figure 10. Parameters of the proposed Gradient Boosting Model [7879].

"In order to develop a AdaBoost model, the user has to set several important parameters which are as follows. The number of estimator is set to 50, as shown in Fig.11. The learning rate is set to 1. It determines to what extent the newly acquired information will override the old information. The number of 1 means that the agent considers only the most recent information. The number of 3 is set as a fixed seed to enable reproducing the results. It is decided to use SAMME as the classification algorithm, which updates base estimator’s weights with classification results. Among the regression loss function options, the Linear option is selected."

Figure 11. Parameters of the proposed AdaBoost Model [7879].

Convergence curve for each approach should be presented.

→ Unfortunately, the Orange 3 software used for this study does not generate Convergence Curve for each ML model.

90% of the data was selected for training on what basis?! This choice usually leads overtraining of the models.

→To reflect the reviewer’s comment, the followings were added in Section 6.2.

"For training and testing of the model, 90% of the total data was used for training and 10% was used for testing. 90-10, 85-15, and 80-20 are the ratios of the most used training and testing data. When developing an ML model using average data, the number of data is reduced. Therefore, in order to secure as much training data as possible, a ratio of 90-10 was used."

Comprehensive and new evaluation indexes such as PI and A_-10index must be used to check the accuracy of the model.

→Authors tried to apply the PI and A_-10index according to the reviewer's suggestion, but it was difficult to find information about them. Instead, we used MSE, RMSE, MAE, and R2, which evaluate the accuracy of the model. We ask the reviewers for their understanding.

The results and discussion section needs to be reviewed and rewritten and is not acceptable with the current form.

→To reflect the reviewer’s comments related to the results and discussion sections, Sections 3,5, 6, 7 were modified a lot. Accordingly, Section 7. Conclusion was modified as follows.

"The goal of this work was to show how ML models can be used to forecast the fatigue life (N) of plain concrete under uniaxial compression. The fatigue life was forecasted using Random Forest, Neural network, Gradient Boosting, and AdaBoost Models. The models were developed sequentially using three data sets. The first was developed with original data, the second was developed with outliers removed, and the last model was developed with the average value of data with different outputs in the same input. For training and testing of the models, a ratio of training and testing was used as 90-10 in order to secure as much training data as possible. From this, we were able to make the following conclusions."

The figures quality are low and should be modified.

→Reviewer 1 also asked to modify some Figures. Thus, the readability of labels and graphs in Figures 1, 2, 12 was improved.

The results of training and testing should be presented separately so that the model can be evaluated properly.

→According to the reviewer’s suggestion, the results of training and testing are presented in the manuscript and descriptions were modified in Section 6 as follows.

a) In Section 6-1, "The four ML models (Random Forest, Neural Network, Gradient Boosting, and AdaBoost) were run, and the results of training and testing for each model are shown in Table 8-1 and 8-2"
b) In Section 6-1, "Moreover, Table 8-1 and 8-2 shows that the Gradient Boosting model with the value of the minimum error and a high R² value is the indication of high accuracy in predicting outcomes."
c) Table 8-1 was newly added while Table 8 in the first version now becomes Table 8-2 as follows.

Table 8-1. Result of ML models with training original data.

Model	MSE	RMSE	MAE	R²
Random Forest	0.351	0.592	0.411	0.768
Neural Network	0.551	0.742	0.547	0.635
Gradient Boosting	0.334	0.578	0.393	0.779
AdaBoost	0.341	0.584	0.389	0.774

Notes) MSE: mean squared error, RMSE: root mean squared error, MAE: mean absolute error, R²: coefficient of determination

Table 8-2. Result of ML models with testing original data.

Model	MSE	RMSE	MAE	R²
Random Forest	0.312	0.559	0.402	0.740
Neural Network	0.416	0.645	0.461	0.655
Gradient Boosting	0.297	0.545	0.390	0.753
AdaBoost	0.315	0.561	0.389	0.738

Notes) MSE: mean squared error, RMSE: root mean squared error, MAE: mean absolute error, R²: coefficient of determination

d) In Section 6-2, " The four machine learning models (Random Forest, Neural Network, Gradient Boosting, and AdaBoost) were run, and the results of training and testing for each model are shown in Table 9-1 and 9-2"
e) In Section 6-2, "As shown in Table 9-1, the Gradient Boosting model with training data provides the highest determination coefficient, R² = 0.809, followed by R² = 0.805 from the AdaBoost model, and then 0.795 from the Random Forest model. The Neural network gave the lowest R² value at 0.726. As shown in Table 9-2, the Gradient Boosting model provides the highest determination coefficient, R² = 0.803, followed by R² = 0.794 from the AdaBoost model, and then 0.791 from the Random Forest model. The Neural network gave the lowest R² value at 0.726."
f) Table 9-1 was newly added while Table 9 in the first version now becomes Table 9-2 as follows.

Table 9-1. Result of ML models with training data excluding outliers.

Model	MSE	RMSE	MAE	R²
Random Forest	0.296	0.544	0.379	0.795
Neural Network	0.479	0.692	0.510	0.668
Gradient Boosting	0.275	0.524	0.359	0.809
AdaBoost	0.282	0.531	0.355	0.805

Table 9-2. Result of ML models with testing data excluding outliers.

Model	MSE	RMSE	MAE	R²
Random Forest	0.321	0.566	0.414	0.791
Neural Network	0.419	0.647	0.500	0.726
Gradient Boosting	0.301	0.549	0.397	0.803
AdaBoost	0.315	0.561	0.417	0.794

g) In Section 6-3, "The four machine learning models (Random Forest, Neural Network, Gradient Boosting, and AdaBoost) were run, and the results of training and testing for each model are shown in Table 10-1 and 10-2"
h) I In Section 6-3, "As tabulated in Table 10-1, the Gradient Boosting model with training data provides the highest determination coefficient, R² = 0.982, followed by R² = 0.973 from AdaBoost, then 0.887 from the Random Forest model. The Neural Network model showed the lowest R² value as 0.679. As tabulated in Table 10-2, the Gradient Boosting model provides the highest determination coefficient, R² = 0.915, followed by R² = 0.893 from the Random Forest model, then 791 0.876 from AdaBoost model. The Neural Network model showed the lowest R² value as 0.730."
i) Table 10-1 was newly added while Table 10 in the first version now becomes Table 10-2 as follows.

Table 10-1. Result of ML models with training average data excluding outliers.

Model	MSE	RMSE	MAE	R²
Random Forest	0.175	0.418	0.303	0.887
Neural Network	0.495	0.704	0.534	0.679
Gradient Boosting	0.027	0.166	0.094	0.982
AdaBoost	0.041	0.204	0.101	0.973

Table 10-2. Result of ML models with testing average data excluding outliers.

Model	MSE	RMSE	MAE	R²
Random Forest	0.145	0.381	0.288	0.893
Neural Network	0.367	0.606	0.493	0.730
Gradient Boosting	0.115	0.339	0.280	0.915
AdaBoost	0.168	0.410	0.304	0.876

The number of used and introduced data is not consistent with Figure 12.

→Among the average data, the number of test data is 31, and the number is correct.

Figure 14 should be presented separately based on training and testing dataset.

→According to the reviewer’s suggestion, a new Figure 13 was created presenting the results of training and testing dataset. Additionally, the manuscript has been added as follows. And Fig. 13 in the first version now becomes Fig. 15.

"The result of developed model using the training average data and testing average data was shown in Figure 13. Gradient Boosting Model has the highest value of R² with the training dataset and testing dataset."

Figure 13. R² value (training vs testing).

The classification done in sections 6-1 to 6-3 is completely dumb and incomprehensible and leads to confuse to the reader.

→Authors think that these sections are necessary to show that ML model using average data (6-3) produces better results than ML Model using other data (6-1 and 6-2).

The comparison table of the used models based on scoring for the two parts of training and testing should be provided, and the final score and rank of each model should be presented.

→Authors answered this question in the Comment 13.

Taylor diagram and Box plot should be presented comprehensive evaluating the approaches.

→Authors created a new box plot in Fig. 14 to address reviewer’s suggestion. And some descriptions on this were made in the text as follows.

" Box plots in Fig.14 show the range of ouptput value from each ML model and the observed output value with testing average data."

Fig. 14. Box plot for ML with testing average data

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The paper was improved according to the reviewer's suggestions.

Author Response

We appreciate your comments.

Reviewer 3 Report

I am satisfied with the revision.

The reference [86] is still not opened, please check it.

Author Response

We have rechecked again through “Chrome and Microsoft Edge.” It works in both systems. We will let the editorial office recognize this.

[86]

https://towardsdatascience.com/clearly-explained-pearson-v-s-spearman-correlation-coefficient-ada2f473b8

Reviewer 4 Report

Thank the authors for answering and addressing my comments, however some minor issues remain that should be modified accurately:

In comment 3, if use these references you should refer them. There are also many irrelevant references that can be removed.

In comment 3, your answer did not satisfy me. Authors should either perform sensitivity analysis with respect to different percentages of training and testing data, or should use valid references for this purpose. In this case, the authors are not allowed to apply personal ideas!

In comments 10 and 18, the authors did not address the comments. They have stated that "Authors tried to apply the PI and A-10index according to the reviewer's suggestion, but it was difficult to find information about them". All these indexes are introduced in the reference papers("Predicting resilient modulus of flexible pavement foundation using extreme gradient boosting based optimised models." International Journal of Pavement Engineering (2022): 1-20.). The box plot is also wrongly drawn. Taylor Diagram is also ignored!

Author Response

Response to Reviewer 4:

In comment 3, if use these references you should refer them. There are also many irrelevant references that can be removed.

→ As suggested, we added a new reference number [84] and Section 6. 5 Comprehensive Model Evaluation was newly added in the manuscript. And a previous reference [84] was deleted, as suggested.

"[84] Benemaran, R.; Esmaeili-Falak, M.; Javadi, A. Predicting resilient modulus of flexible pavement foundation using extreme gradient boosting based optimised models. I. J. Pavement Eng. 2022, 2095385."

[84] .Piryonesi, S.; El-Diraby, T. (2021-02-01). Using Machine Learning to Examine Impact of Type of Performance Indicator on Flexible Pavement Deterioration Modeling. J. Infrastruct. Sys. 2020, 27(2), 04021005. We noticed that the reference [84] contains a lot of useful statistical treatment skills.

In comment 3, your answer did not satisfy me. Authors should either perform sensitivity analysis with respect to different percentages of training and testing data, or should use valid references for this purpose. In this case, the authors are not allowed to apply personal ideas!

→ As requested, authors performed the sensitivity analysis with different training and testing ratio. And authors made a new Section "6.4 Sensitivity Analysis of ML models." Table 11 and Figure 14 are added in the manuscript.

"6.4. Sensitivity Analysis of ML models

Sensitivity analysis was performed to find a better ML model with various training and testing ratio. The results of sensitivity analysis are summarized in Table 11 and Fig. 14. All ML models show the highest R² value when the training and testing ratio is 90:10. When the training and testing ratio is 90: 10, the R² value of the GB model is 0.915 which is the best value among the sensitivity analysis results.

Table 11. Sensitivity Analysis of ML models with different training and testing ratio.

Model	R²(75:25)	R²(80:20)	R²(85:15)	R²(90:10)
Random Forest	0.756	0.751	0.825	0.893
Neural Network	0.677	0.659	0.681	0.730
Gradient Boosting	0.811	0.823	0.882	0.915
AdaBoost	0.742	0.749	0.839	0.876

Figure 14. Results of Sensitivity Analysis with various training and testing ratio."

In comments 10 and 18, the authors did not address the comments. They have stated that "Authors tried to apply the PI and A-10index according to the reviewer's suggestion, but it was difficult to find information about them". All these indexes are introduced in the reference papers("Predicting resilient modulus of flexible pavement foundation using extreme gradient boosting based optimised models." International Journal of Pavement Engineering (2022): 1-20.).

→ As requested, A_{-10 index} and PI are calculated and listed in Table 12 which is newly made. Additionally, a comprehensive evaluation was performed and added in the manuscript by newly adding "Section 6.5 Comprehensive Evaluation of ML Models."

"6.5 Comprehensive Evaluation of ML Models

In addition to the classic model performance evaluation indices such as R², MSE, MAE, new indices such as VAF, PI, and A_10−index were proposed to assess the eﬃciency of the developed models by Menemaran et al. [84]. It was noted that the smaller RMSE, MAE, PI indicate more trustable statistical impressions [84]. PI and A_10−index is represented by Equations (1)-(2) [84].

(1)

(2)

where is the mean of the observed values. Also, M represents the sample number, and m₁₀ is the number of data with a ratio of measured to predicted value between 0.9 and 1.1 [84].

In this study, five model performance indices (RMSE, MAE, R², A_10−index, PI) were assessed to have the comprehensive comparison. The models were scored from 1 to 4 based on each of the 5 indices; then, the scores were summed to assign a total score for each model. Results for this comparison score are listed in Table 12. Table 12 shows that the Gradient Boosting model has the best performance. On the other hand, Neural Network model has the lowest accuracy for testing data, respectively.

Table 12. Comprehensive Evaluation of ML models."

Model	RMSE	score	MAE	score	R²	score	A_10−index	score	PI	score	total score
Random Forest	0.381	3	0.288	3	0.893	3	0.323	3	0.048	3	15
Neural Network	0.606	1	0.493	1	0.730	1	0.290	2	0.081	1	6
Gradient Boosting	0.339	4	0.280	4	0.915	4	0.387	4	0.043	4	20
AdaBoost	0.410	2	0.304	2	0.876	2	0.226	1	0.052	2	9

4)The box plot is also wrongly drawn. Taylor Diagram is also ignored!

→ Since the reviewer pointed out that the boxplot was wrongly drawn and authors are not sure which boxplot is meant and required, authors would like to delete the boxplot. Instead, authors figure out what a Taylor diagram is and draw a Taylor diagram of the ML models. Also, authors deleted the old Fig. 14 and the Taylor diagram was added in Fig. 15. The manuscript has been revised at the end of Section 6.5, as follows to reflect your comments.

"Also, the Taylor diagram of the 4 developed ML models was presented in Fig. 15. It is seen from the graph that the Gradient Boosting model has the best performance, while the Neural Network model has the worst performance with the average data excluding outliers."

Figure 15. Taylor diagram of ML model developed."

Author Response File: Author Response.pdf

Article Menu

A New Approach to Machine Learning Model Development for Prediction of Concrete Fatigue Life under Uniaxial Compression

Further Information

Guidelines

MDPI Initiatives

Follow MDPI