*4.2. Bagging*

Bootstrap aggregation or bagging is a powerful procedure to improve the bagged decision trees learning behavior to achieve low root mean squared error (RMSE) by reducing the high variance from a single-tree structure [42].

The training data can be split into multiple subsets at random, which are fitted and trained by independent decision tree models separately. The aggregation of the predictions across all the trees is averaged to minimize the correlation effects between each couple of trees. The process is shown as follows:


$$f\_{\text{bag}}(\mathbf{x}) = \frac{1}{B} \sum\_{b=1}^{B} f^{b\*}(\mathbf{x}) \tag{1}$$

According to this procedure, though, every single tree model has high variance. The averaged *B* trees, which combine hundreds of trees, can reduce the value as a whole.

### *4.3. Out-of-the-Box Performance*

For the classification problem, which has qualitative outcomes, a voting strategy is adapted to record the predicted class and pick the most frequently occurring class. It is a straightforward method to assess the error performance for a bagged prediction model.

Out-of-bag (OOB) observations are used to predict and evaluate the results by trained model. Comparing the results and observations, the classification error or test error can be accumulated.

Set the testing data space as *T*, which has *n* trees. The data can be presented as:

$$T = \{ (\mathfrak{x}\_1, \mathfrak{y}\_1), (\mathfrak{x}\_2, \mathfrak{y}\_2), \dots, (\mathfrak{x}\_{n\_\prime}\mathfrak{y}\_n) \}$$

Fed *T* to the given RF model, we obtain another data set, such as:

$$T\_f = \left\{ \left( \mathbf{x}\_{1\prime} y\_{f1} \right), \left( \mathbf{x}\_{2\prime} y\_{f2} \right), \dots, \left( \mathbf{x}\_{n\prime} y\_{fn} \right) \right\}$$

Therefore,

$$OBB\ error = \frac{number\left(y\_i \neq y\_{fi}\right)}{n} \tag{2}$$

### *4.4. The State-of-the-Art Method*

The state-of-the-art (SOTA) methods are applied to check if the RF model can achieve the best performance in this learning work. Decision tree and support vector machines (SVM) models are selected to compare with the trained RF model. The decision tree model, which has good strength, is the base unit of the RF model. The SVM is also a high-performance classification algorithm. They are both commonly used in data mining.

Decision trees are constructed by the significance measurement of data. In addition, the SVM is built based on the liner kernel. The accuracy of the models is set to be the baseline for the comparison.

### *4.5. Relative Importance of Variables*

Even though the structure of bagged trees grows bigger to gain significant improvement from a single tree, the whole model becomes harder to interpret. For computing the relative importance of each variable in the RF model, the importance value of each predictor in every single tree is recorded and accumulated to realize the comparison process. Thus, the most effective factor will be gained in the given predicted result. The high value of the relative importance means a significant weight in their relationships, which is a more important factor in the road deterioration process. Each variable importance can be summed by the reduction in the loss function, which is attributed to each split in a given tree.

### **5. Model Construction**

### *5.1. Model Structure Design*

Four key steps are organized, as shown in Figure 2. A cyclic process is selected to train the RF model and optimize the model parameters repeatedly to obtain a minimal OOB error. Firstly, the quality of the given database is the most important as the basement in the whole model structure. Next, the main body of the RF model for model training is built by R language and its packages. The model will be several times to return to this step for the procedure of model optimization. This interaction of these two steps determines the final model structure and parameters, which will be applied in testing data. Finally, the prediction error rate will be estimated to assess the model performance. If insufficient performance is found, the cyclic process must run again and again after adding new data

and checking the data effectiveness and correlations until achieving the best fitness of the RF model.

**Figure 2.** Structure design of the RF model.

### *5.2. RF Model Construction*

When the structure of the RF model for potential damage is decided, the training set data are input to fit and grow every single tree with two key hyperparameters, including *mtry* and *ntree*. The *mtry* is the number of variables tried at each split. The *ntree* is the total number of trees the forest will grow. To obtain classification, every tree is run down in the forest with a number *m* of variables, which is used to split the node. With no pruning, trees are grown as large as possible. Random forest cannot be overfit. Therefore, the number of single trees can be grown as many as the computer capability can do. With the increase in the tree number, the OOB error will keep decreasing. When all the data are run down the trees, the proximities, OOB error, and variable importance are computed. Finally, the most possible result is voted by majority voting to obtain the confident prediction. The process is shown in Figure 3.

**Figure 3.** RF model construction process.

### **6. Results and Discussions**

*6.1. Data Characteristics and Correlations*

Related variables are considered as many as possible in this study for comprehensive understanding. Therefore, thirty-four categories of data about the properties of the in-field road are prepared to train. Data collection is the most important step before a model is constructed. The resources and data features are matters of the prediction results. The details of the data used in the training process cannot be exhibited due to the large data group. A general view of characters and correlations of data sets is plotted in a matrix in Figure 4.

The diagonal line of plots is the distribution status of variables, which shows that all datasets collected from road properties are almost on or can be standardized into normal distribution. Therefore, the training data are effective to work reasonably in the model. A dataset on normal distribution means it fits with the principles of the average detection data. There is no need to delete a low or abnormal variable.

The plots on the intersection between every two properties are their correlation index and fitting curves. It is clear that some of them have obvious linear correlations, which are always desirable and easy to evaluate in a typical numerical analysis. However, the other data with non-linear relationships are hard to obtain rules for. Therefore, there are no consistent principles for these factors that can determine the occurrence of potential damage. This RF model can help to combine and follow all hints of variables, even those that are not important to achieve the best prediction.

**Figure 4.** Data characteristics and correlations matrix: the \* represents the significance level.

#### *6.2. Number of Trees and Number of Variables Tried at Each Split*

The two key hyperparameters, *ntree* and *mtry*, are determined by the exhaustive method. At first, an RF forest was constructed with the following default settings: *ntree* = 500 and *mtry* = 5. The OOB estimate of the error rate of the RF model is 20.24% and the confusion matrix is shown in Table 2.

Through the exhaustive method, *mtry* is assigned for 1 to 35 in the default RF model with the other parameters fixed to gain the minimal error rate. According to the same method, *ntree* is taken to traversal algorithm again by fixing *mtry* value. The results and the processes are presented in Figures 5 and 6.


**Table 2.** Confusion matrix of the default RF model.

Y presents the points marked as distress; N presents the points marked as in good condition.

**Figure 5.** The relationship between *mtry* and the error rate.

With the increase in *mtry* from 1 to 35, the error rate keeps decreasing. Generally, the number of variables tried at each split in an RF model, namely the maximal deep of a tree the model grows, is random in the range between one to the number of variables. It always needs a balance for lower single-tree correlations and a certain prediction strength. Therefore, it is not a general law for an RF model as the error rate can be reduced by introducing more variables unless the variables are all effective for the model with little correlations.

In our given RF model, there are some correlated factors. This is not the main reason which affects the model accuracy until *mtry* equals 23. Before that, the model does not consider enough variables, which strongly helps with the increase in variables. After that or even after 19, the model is improved little when there are less independent residual factors. However, the optimal *mtry* value is 35 with the lowest error rate. That means that all factors have their own weights in the model even though some parts are subject to dependency.

With the increase in *ntree*, i.e., the number of trees generated by the model, the noise can be reduced in the model. When *ntree* arrives at a certain number, the error rate of the model will maintain stability. However, for the calculating speed of a computer, the best *ntree* value is determined. When *ntree* passed over 400, the prediction error rates for Y, N, and the average of the model achieved the lowest value and kept the trend. Therefore, the *ntree* is selected as 400 for the RF model.

**Figure 6.** The relationship between *ntree* and the error rate.

### *6.3. The Optimized RF Model*

The final RF model used in training and predicting is gained through the two steps of optimization for the hyperparameters. The main tree sizes, and the node numbers of every tree in the forest, are distributed, as shown in Figure 7. The most frequent occurrence in tree sizes is six, which presents the major samples the trees in the forest look like.

**Figure 7.** The tree size and its occurrence frequency.

The optimized RF model is estimated by the bagged testing data. The performance of the accuracy for the model is shown in matrix Table 3. The average OOB of error rate is 16.67%, which is improved greater from 20.24%. For an in-field project, prediction accuracy higher than 76% is thought to be a good performance. Compared with some other studies in highway or road topics, there are more variables in this program, which may accelerate the accuracy by considering more comprehensively. In particular, the accuracy of Y prediction, which means that the road has a potential failure at this position, has arrived at 85.13%. It is very important for road maintenance and safety in the application for saving money and lives.

**Table 3.** Confusion matrix of the optimized RF model.


Y presents the points marked as distress; N presents the points marked as in good condition.

A decision tree model was constructed to compare with the RF model. The result is shown in Figure 8 and Table 4. Moreover, a support vector machine (SVM) model was built, and the relative confusion matrix is shown in Table 5.

**Figure 8.** The decision tree model.

In general, the accuracy of the decision tree model and the SVM model is 65% and 65.52%, respectively, for the separate predictions of the potential damage. Nevertheless, the decision tree model is more logical and easier to interpret. In the tree, the RI20, DI3, and VRS are the three most important factors to classify the data, and the prediction probability is given. All in all, the performance of the RF model is outstanding among the three models.



Y presents the points marked as distress; N presents the points marked as in good condition.

**Table 5.** Confusion matrix of the SVM model.


Y presents the points marked as distress; N presents the points marked as in good condition.

### *6.4. Model Application and Prediction Evaluation*

The RF model is applied and the classification performance is examined by margins and multidimensional scaling (MDS) analysis. If a margin value of a test point is higher than zero, it is identified as the right prediction. As shown in Figure 9, the prediction data meets the normal distribution. In addition, the major body of the predictions is in the upper area, which shows a good model performance, despite some abnormal points.

**Figure 9.** The margins of the RF model by class: (**a**) the margins of outputs checking the classification performance; (**b**) the outputs mapping from multiple dimensional data to 2 dimensions.

In another aspect, the MDS plot is made by R language (Figure 10). The positions of the predictions are marked in this 2D map. The predictions, especially the main bodies of Y and N predictions, have a clear boundary to each other. However, also, some points are mixed up with others, because of some inevitable abnormal points. The next step is to improve the accuracy of the detection in practice. Moreover, here, the predictions are classified into two groups, showing the classification ability of the RF model.

**Figure 10.** The MDS of the RF model.

### *6.5. Factor Importance*

The importance of the variables is evaluated by a mean decrease in the accuracy index and a mean decrease in the Gini index for the explanation for the RF model, as shown in Figure 11. The more decrease the indexes have, the more important the factor is. The RI20, i.e., the increase in rutting over 20 days, is the most important factor, that is, if a point is rutting deeply in a short time, it is most likely to obtain moisture damage under the surface or surface failure soon. Following the RI20, i.e., the increase in deflection in the third test stage, the rutting at the 20th day and 5th day, and the position are outstanding in all variables. In these five factors, the most interesting thing is that three factors are linked to rutting, which directly presents the state of a road; a highly important factor is linked to deflection, which presents the strength of the road; and the point position is effective for predicting the project (i.e., the different construction method), which companies or materials adapted to in this road, leading this phenomenon. Based on the main important factors, it can be asserted that, before moisture damage occurs, there must firstly be a significant increase in rutting and deflection detected. Some original properties of the road, such as original rutting, original deflection, and surface splitting strength, have little weight in the model. This means that the moisture damage matters for a cumulative effect rather than the initial properties.

**Figure 11.** Factor importance.

These specified values of variables and their relationships are analyzed by density curve plots. For these important variables, the overlap section of the Y and N area is smaller, which has a higher classification strength (Figure 12). Thus, it will hold a bigger weight in the model.

To compare the traditional analysis of properties, the three most important categories of factors from the RF model are selected for making graphics of their relationships. The void rate, the rutting increase over 20 days, and the increase in deflection in the third test stage are fitted in Figure 13.

**Figure 12.** Specified values of variables and their relationships.

Another interesting phenomenon is found. The void rate between 4% and 6% has the lowest probability to drop in rutting and deflection [43,44]. This finding is very similar to the Superpave construction principles. Therefore, to control the moisture damage at an early age of a road, the most important thing is to minimize construction segregation. The result proves the interpretability of the RF model, which can be easily connected to the practice work. Furthermore, the prediction of the model is rational and logical.

**Figure 13.** The relationships amongs<sup>t</sup> the three important factors.
