*2.2. Machine Learning Methods Employed*

Two ensemble machine learning methods (gradient boosting and random forest) were used to accomplish the objectives of this research, using Python code and the Anaconda Navigator program. Spyder 4.3.5 was used to execute the gradient boosting and random forest methods. Typically, these machine learning methods are used to anticipate desired outputs based on input factors. These methods are capable of forecasting the temperature effects, the strength properties, and the durability of materials [40,41]. Ensemble machine learning methods commonly exploit the weak learner by constructing 20 submodels that may be trained on data and modified to maximize the R2 value. The strategies to choose optimal hyperparameters include splitting the data for training and testing models (80% for training and 20% for testing), selecting the optimal submodel based on R2, and the k-fold analysis method. R<sup>2</sup> represents the performance/validity of machine learning approaches. The R<sup>2</sup> statistic is used to determine the amount of variance in a response variable provided by a model. In other words, it expresses the model's fit to the data quantitatively. A number around zero implies that fitting the mean is comparable to fitting the model, but a value near one shows that the data and model are nearly completely matched [42]. The subsections below discuss the machine learning techniques employed in this study. Moreover, all machine learning methods are validated using k-fold assessment, statistical checks, and error measures (root-mean-square error (RMSE) and mean absolute error (MAE)). Furthermore, sensitivity analysis is performed to determine the effect of each input variable on the predicted findings. The flow diagram in Figure 3 illustrates the research method used in this study.

**Figure 3.** Flowchart of research methods.

#### 2.2.1. Gradient Boosting

Friedman [43] presented gradient boosting as an ensemble strategy for classification and regression in 1999. Gradient boosting is only applicable to regression. As seen in Figure 4, the gradient boosting technique compares each iteration of the randomly chosen training set to the base model. A weak predictor is constructed using all of the training data. Then, the training data are predicted using a weak predictor. With the expected outcome, it is simple to calculate the residuals for each training instance. Gradient boosting for execution may be sped up and accuracy increased by randomly subsampling the training data, which also helps to prevent overfitting. The lower the training data percentage, the faster the regression, because the model must suit minor data every single iteration. Gradient boosting algorithms require tuning parameters, including n-trees and shrinkage rate, where n-trees is the number of trees to be generated; n-trees must not be kept too low, while the shrinkage factor—normally referred to as the learning rate employed to all trees in the development—should not be set too high [44].

**Figure 4.** Schematic representation of the gradient boosting technique.

### 2.2.2. Random Forest

Random forest are deployed by bagging decision trees using the random split choice technique [45]. The modeling procedure for the random forest approach is illustrated schematically in Figure 5. Each tree in the forest is generated by means of an arbitrarily selected training set, and each split inside a tree is constructed by means of an arbitrarily chosen subgroup of input factors, yielding a forest of trees [46]. This element of instability adds variation to the tree. The forest as a whole is composed completely of mature binary trees. The random forest technique has established itself as a highly effective tool for general-purpose classification and regression. When the number of variables surpasses the number of observations, the technique has proven improved precision by aggregating the predictions of several randomized decision trees. Additionally, it is adaptable to both largescale and ad hoc learning tasks, yielding measures of varying degrees of importance [47].

**Figure 5.** Schematic representation of the random forest technique [45].
