3.5.1. Model Development
The pavement condition of a transportation network changes widely across a geographic area. This spatial variation is due to variability in the pavement structure, traffic load, and weather conditions of pavement over the network. This variability can be addressed by homogenous segmentation, i.e., by dividing the network into segments with almost consistent pavement conditions.
Comprehensive pavement historical data were not available for the Afghanistan road network, i.e., the exact objective amount of pavement criteria encompassing pavement structure and layer thickness, weather conditions, and traffic loading of different segments were unavailable. Therefore, each one of these criteria was subjectively divided into two levels to be assigned to each family. For this purpose, pavement structure, weather conditions, and traffic loading were divided into thick (pavement thickness ≥ 150 mm) and thin (pavement thickness < 150 mm), harsh (annual freeze-thaw cycles ≥ 15), and mild (annual freeze-thaw cycles < 15), and heavy (AADT ≥ 12,000) and low (AADT < 12,000), respectively. As each one of these three criteria had two levels, the full factorial experimental design, comprising a total of 2
3 (i.e., 8) families, was defined and is presented in
Table 4 for performance model development. The number of levels for each criterion could have been more than two, e.g., defining three groups for traffic load (i.e., heavy, medium, and low) instead of two. However, for two reasons, two levels for each criterion were selected: (1) lack of data made it hard to find enough samples for each family to build up a model and (2) no significant difference would be distinguished between performance models of the extra created families. It could be anticipated, from an engineering sense, that Family 2, with a thick pavement structure, low traffic load, and mild weather conditions, would perform better than Family 7, with a thin pavement structure, high traffic load, and harsh weather conditions. This would be confirmed later via the pavement performance model development.
For developing models for pavement families, a univariate regression model was employed to ensure it was easy and clear enough (not a black box, such as with meta-heuristic models) to implement in developing countries. As mentioned earlier, the total pavement length that was surveyed was 558.7 km. The total pavement length was divided into 100 m sections, resulting in 5587 sections. Due to the similarity of adjacent pavement sections in each family, almost every 5 km of adjacent sections was defined as a segment leading to 112 segments. For the sake of modeling, the dataset, including 112 segments, was randomly divided into subsets of train and test with the dataset portions of 80% and 20%, respectively. An attempt was made to randomly select samples from different pavement conditions in the train and test datasets. This sampling technique would avoid sampling bias. Sampling bias would result in signs of underfitting or overfitting. In the end, the RMSE and R2 of the model were calculated and reported as the model’s metrics. The error was defined as the difference between the predicted (by the model) and actual values of the PCI.
To develop a model for PCI, based on pavement age for different families, the average PCI of pavement segments was plotted against pavement age. In
Figure 6, the empty blue circles show the mean PCI for each segment. The primary aim was to develop a model for each family. An attempt was made to build a model for each family, as depicted in this figure and which are represented by solid lines with filled start and end points. However, the models were not meaningful due to the limited range of pavement age of each family (mostly between 1 and 3 years) that was in turn due to the limited amount of data accessible for each family. Therefore, we decided to treat all PCI data as a single family, meaning that only one pavement performance model was developed. Thus, herein, the average PCI of all the segments was employed to construct a performance model, the solid back curve (so-called master curve), combining all families. Generally speaking, this curve would represent the performance of the entire families; however, it missed some detailed information about each family as it was all combined in a single performance model. As more information will be collected from each family, a single comprehensive model can be developed for each family which would represent its performance more accurately.
Using the univariate linear and non-linear regression techniques, different models, i.e., simple, second-, third-, and fourth-order polynomial, exponential, power, and logarithmic models, were built and fitted to the train data (80% of the total gathered data). The best model was shown to be the third-order polynomial model concerning model errors (RMSE), coefficient of determination (R
2), engineering sense (ES), related literature (RL), and underfitness/overfitness (UO) presented in
Table 5. As can be seen in this table, the best model in terms of R
2 and RMSE is the fourth-order polynomial; however, it does not match with engineering sense (ES) and related literature (RL). It also suffers from overfitting (UO). Thus, the best-performing model after this is the third-order polynomial model, which matches engineering sense (ES), coincides with related literature (RL), and does not overfit and underfit.
As can be seen in
Figure 7a, the simple or first-order polynomial model not only expresses low fitness but also cannot represent the variation in the pavement degradation rates. Moreover, the second-order polynomial model does not make engineering sense as it shows a significant increase/upgrade in the PCI in the initial stages which would not be feasible (in the case of not conducting maintenance actions). The third-order polynomial model, from years 7 to 11, shows a small increase of about 2%, going up from 93% to 95%. This increase, which is an error, is negligible on a scale of 100. The fourth-order polynomial model clearly expresses sharp decrease and increase, which do not make engineering sense. Regarding the pattern of data, the order one and two polynomial models suffer from underfitting, while order four indicates clear overfitting.
Figure 7a also depicts that, as pavement age increases, the standard deviation of the PCI increases.
Figure 7b illuminates the PCI histogram for the entire segments, which represents a slight positive skewness, in turn meaning that the bulk of the data are located at the medium and higher PCI, and that there are few very low PCI values. Moreover,
Figure 7c expresses how the non-linear regression models, i.e., exponential, power, and logarithmic, fitted to the data. These did not only not fit well to the data (show high RMSE and low R
2) but also did not have engineering sense, as they present a sharp decrease in the initial years after construction. This does not match with reality as the PCI should have decreased slightly over primary ages.
It is concluded that, as clearly shown in
Table 5 and
Figure 8a, the best-fitted model is the third-order polynomial model. The significant finding from this model is its deterioration rate which is higher in the pavement age range of 15 to 20 years than in initial ages (i.e., 0 to 15). This could be a vital warning for road authorities to run pavement preventive maintenance action before this range, so as to prohibit the sharp degradation in road conditions which leads to much higher corrective maintenance cost as compared with proactive maintenance.
The implications of this methodology are that, in the case of a country or city with no or limited pavement condition data, it would be possible, first, to collect data with a cost-effective and adequately accurate tool, i.e., a smartphone, and, second, to build up a primary pavement performance model by which to represent the deterioration rate of pavement in the pavement age range of the monitored sample sections.
Such a pavement performance model may encounter some errors and would perform with a higher accuracy if these errors can be decreased. For instance, the sparse sample segments’ pavement age causes errors in model development, i.e., if the sample segments’ age covered a wider range and was not scattered, the model errors would be decreased.
Moreover, in the case of regular pavement condition data collection, more data would enrich the model. More data collection can be executed in three ways. Firstly, each segment can be monitored repeatedly over its lifespan. Therefore, each segment can be compared with itself over different time sections which could help in developing stage-based models such as Markov chain models. Secondly, more segments in each family can be investigated with a wider range of pavement ages, so as to be able to develop a performance model for each family. Finally, more pavement criteria can be acquired. This would, furthermore, result in the development of a more precise model if the exact pavement criteria of inventory data, traffic loading data, and weather condition data would be gathered. In such a case, these data can also be applied as independent variables in the process of modeling, to build up multiple regression models and leading to enhancement of the models’ performance.
Because there have been no pavement performance models developed for Afghanistan, there is no possibility to compare the results of this study with similar previous work in the same country; however, they can be generally compared with similar attempts in other countries to ensure that the trend of the PCI deterioration over time would be approximately the same as previous work. As compared with the related literature, it was concluded that the pavement performance model that was developed in this study also perfectly expressed the same trend as other researchers have claimed [
48,
49,
50]. The similarities between the developed model and the previous work are twofold. First, the pavement degradation rate is lower over the initial ages after construction than later ages. Such performance models generally degrade sharply after 75% of their life cycle, which can be clearly noticed in the developed model in this study in
Figure 8a. Secondly, the most common pavement performance model curve shape in the related literature is an S shape, which is similar to the developed model herein. The S shape curve would fit best to the pavement performance model, as it can represent two different degradation rates (i.e., lower and higher). The third-order polynomial model proposed in this study exactly follows the same S shape introduced by previous researchers.