**1. Introduction**

Moisture damage is one of the main problems of asphalt pavement in service. Road networks are afflicted by this problem in the world for decades [1]. It is hard to detect the distress as it always happens underneath the surface in the middle and lower layers initially as potential damage [2]. As soon as the response emerges on the surface, the surface course will fail within just a few days, which may lead to serious safety problems. In another aspect, the unexpected distress on the road should need rapid maintenance, which will suspend traffic, profuse raw materials, and emit harmful smoke [3,4]. This is a huge carbon-consuming process, which not only costs massive time and money, but also natural resources [5]. To solve such a problem, the concise prediction for the positions of potential damages is one promising way [6–8].

Machine learning (ML) is a good way at building a high-performance prediction model. In the practice, we can continue to input the maintaining information into the prediction model. Then, if there is a section of the road that has already lost strength in the inner or middle of the structure but nothing or little response occurred on the surface, we can find it by the model. When the positions can be determined, potential distress can be eliminated in time instead of breaking out. Then, a significant problem can be weakened to a minor one, which can save lives and increase road value. Additionally, the performance and interpretability of the model are both important for the evaluation and application of the model. The propose of this study is using the actual detection data to construct a high

**Citation:** Guo, X.; Hao, P. Using a Random Forest Model to Predict the Location of Potential Damage on Asphalt Pavement. *Appl. Sci.* **2021**, *11*, 10396. https://doi.org/10.3390/ app112110396

Academic Editors: Luis Picado Santos, Amir Tabakovic, Jan Valentin and Liang He

Received: 15 September 2021 Accepted: 4 November 2021 Published: 5 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

School of Highway, Chang'an University, Xi'an 710064, China; gangxo@outlook.com

practical model to solve the daily problem of road maintenance. It can be the complementary method to really help engineers to more precisely judge the state of the pavement. It is also important that the construction process and hyperparameters optimization of the model provide an example to support for development and improvement ML models in highway engineering.

### *1.1. Moisture Damage and Potential Distress*

Moisture damage is one of the main forms of potential distress that results in strength loss, stripping, and deformation of pavements [9]. It is generally caused by segregation in the construction process, which is presented as poorly bonding and compaction [10]. This category of distress is so-called potential damage because it is hard to detect with little change in its extrinsic feature at the initial phase but the strength has been loosened [11]. When it bears certain loading, a sudden breakdown may happen on the road surface [12]. Because of this hidden distress, maintenance work and driving safety burden more pressure than the average routine [1].

In order to solve the problem, traditional methods, such as geological radar and falling weight deflectometer (FWD), are used for in-field projects to detect potential failure inter road structures [13–15]. Lots of effort has been made in making a long-term detective routine [16]. However, accuracy of these methods is not achievable. Because moisture damage and potential distress cause multiple driving problems, they cannot be determined by tests from a one-directional aspect under the wide-range and complex in-field conditions [9]. A simple linear relationship cannot construct and explain the correlations between properties and in-field failure of pavements. Furthermore, predictions and classifications derived from the traditional methods may be unstable because of the personal experiences of judgment for potential damage by most nondestructive detection methods [17]. For example, even though the geological radar technology has developed more automatic and precise for pavement detection and the radar images are processed by high advanced software, the detection results are easily affected by the different detection conditions, which will lead to misjudgment from the engineers with less experience.

### *1.2. Machine Learning and Random Forest*

Machine learning (ML) is a technology using algorithms to let computers analyze data and process other affairs stimulating the way of humans learning, which can continue improving their accuracy and capability by the algorithms themselves [18]. ML acts a significant role in statistical research with the rapid development of computational speed and artificial intelligence (AI) algorithms [19]. A ML model can be trained like a project manager to perform classification, prediction, and mining interrelationships on data [20]. Almost every scientific discipline is driven by AI in this big data era, which is growing hugely day by day [21]. Subsequently, science research aided by computers has become more popular and there is a higher general demand of researchers nowadays.

Highway engineering is a traditional discipline of applied science. Its basement is also built by tests and data. With instrument automation in recent years, the data from pavement detection grows furiously [22]. To understand the inner relationships of the big data in highway engineering, two major methods can be used: simplification and comprehension. As an experiential mechanical science, the first method, which combines simplified data and hypotheses, can make problems easier and solve them with mechanical models [23]. However, it overlooks some parts of the experimental characters and randomness of the data to gain a general result. The light weight factors, which are ignored in experiential mechanical models, also have impact in the results. In fact, some unseen capability loss has already existed before distresses appear, which cannot be measured [24]. That is why the traditional methods can explain the reasons well but cannot predict the results accurately for potential damage [25,26]. Therefore, ML models can be the perfect complementary to traditional methods. Neural networks, gradient-boosted model, random forest, and support vector machine have been used in mining data for the long-term

reservation or open assess databases of pavement detection [27–30]. Due to these ML models, the relationships of data can be found and understood more comprehensively than conventional physical models. Furthermore, through the predictions, a better decision can be made by their excellent prediction performance. All in all, ML is an advantageous tool in experimental and theoretical studies for highway projects. In the practice, model adaptability, model structures, and inputted variables are the three key matters we need to consider carefully in a ML construction work.

Random forest (RF) is a promising machine learning algorithm which can help researchers forecast or classify data and information with high performance [31–33]. It is a model that assembles decision trees using a modified bagging method to improve the predictive accuracy [34]. The common strategy of ML to solve a nonlinear problem is to raise data dimension by different weights and biases to discover key features, such as kernel SVM, a neural network. The process of data transition increases the computational complexity. Combining their computational frameworks, it may lead to lower computing productivity as a whole when in a multivariate classification problem. For instance, under the framework of the one vs rest, SVM consumes huge memory with increasing variables (data dimensions), especially using a nonlinear kernel. Comparing to this strategy, RF uses the bagging method to make data into a tree-like 2D structure, which can keep the simplicity of data. Therefore, it has outstanding computing speed and interpretability. In addition, RF can perform as well as kernel SVM and neural network by the bagging method [35]. It has been successfully applied in the predictions of IRI, strength, and cracking on the pavement, and has gained grea<sup>t</sup> performance [36–38]. Moreover, the most important advantage is that RF is good at processing multicollinear, imbalanced, missing data with multiple variables [39]. That is the reason the RF model is suitable for the data derived from in-field tests and detections.

In summary, random forest (RF) model can be trained to predict potential distress and moisture damage for flexible pavement. Considered the complex factors of the test environment, it can avoid the deviation in the typical prediction method, which may just be extracted from a linear regression model. In this study, an RF model is trained and constructed based on the data from a full-size track road test for potential damage prediction. The prediction performance and relative factor importance are estimated for the model. Finally, the analysis method of insight relationships and project problems can be developed and promoted with the RF model.
