3D Reconstruction of Asphalt Pavement Macro-Texture Based on Convolutional Neural Network and Monocular Image Depth Estimation

Liu, Xinliang; Yin, Chao

doi:10.3390/app15094684

Open AccessArticle

3D Reconstruction of Asphalt Pavement Macro-Texture Based on Convolutional Neural Network and Monocular Image Depth Estimation

by

Xinliang Liu

and

Chao Yin

^*

School of Civil Engineering and Geomatics, Shandong University of Technology, Zibo 255049, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(9), 4684; https://doi.org/10.3390/app15094684

Submission received: 11 March 2025 / Revised: 18 April 2025 / Accepted: 21 April 2025 / Published: 23 April 2025

(This article belongs to the Special Issue Artificial Intelligence in Civil Engineering: Latest Advances and Prospects)

Download

Browse Figures

Versions Notes

Abstract

:

The 3D reconstruction of asphalt pavement macrotexture holds significant engineering value for pavement quality assessment and performance monitoring. However, conventional 3D reconstruction methods face challenges, such as high equipment costs and operational complexity, limiting their widespread application in engineering practice. Meanwhile, current deep learning-based monocular image reconstruction for pavement texture remains in its early stages. To address these technical limitations, this study systematically prepared four types of asphalt mixture specimens (AC, SMA, OGFC, and PA) with a total of 14 gradations. High-precision equipment was used to simultaneously capture 2D RGB images and 3D RGB-D point cloud data of the surface texture. An innovative multi-scale feature fusion CNN model was developed based on an encoder–decoder architecture, along with an optimized training strategy for model parameters. For performance evaluation, multiple metrics were employed, including root mean square error (RMSE = 0.491), relative error (REL = 0.102), and accuracy at different thresholds (δ = 1/2/3: 0.931, 0.979, 0.990). The results demonstrate strong correlations between the reconstructed texture’s mean texture depth (MTD) and friction coefficient (f₈) with actual measurements (0.913 and 0.953, respectively), outperforming existing methods. This confirms that the proposed CNN model achieves precise 3D reconstruction of asphalt pavement macrotexture, effectively supporting skid resistance evaluation. To validate engineering applicability, field tests were conducted on pavements with various gradations. The model exhibited excellent robustness under different conditions. Furthermore, based on extensive field data, this study established a quantitative relationship between MTD and friction coefficient, developing a more accurate pavement skid resistance evaluation system to support maintenance decision-making.

Keywords:

road pavement; convolutional neural network; monocular vision; 3D reconstruction

1. Introduction

Asphalt pavement is in contact with tires directly, and its texture affects friction for driving safely [1]. According to the International Organization for Standardization (ISO), pavement texture is divided into micro texture (0.001–0.5 mm), macro texture (0.5–50 mm) and large-scale structure based on the wavelength range [2]. Among these, macro texture plays a key role in determining the skid resistance and noise performance of asphalt pavement by influencing the hysteresis force and low-frequency noise between the tire and pavement [3]. Therefore, developing an accurate and efficient method for reconstructing the macro texture of asphalt pavement is crucial for road condition monitoring and pavement performance evaluation [4].

The 3D reconstruction of asphalt pavement texture is essential for evaluating its texture characteristics. Currently, most equipment used for 3D reconstruction of asphalt pavement texture is industrial-grade, which is expensive and incurs high reconstruction costs, making widespread application difficult. Therefore, transforming of pavement texture 3D reconstruction equipment from industrial level to consumer level is an urgent challenge that needs to be addressed. Some researchers have attempted to use image-based methods to realize the 3D reconstruction of the macroscopic texture of asphalt pavement at a relatively low cost [5,6]. Based on the number of perspectives needed to reconstruct the object to be measured from different perspectives, these methods can be categorized into monocular image 3D reconstruction [7], binocular image 3D reconstruction and multi-view image 3D reconstruction [8]. Among these, compared to monocular image-based 3D reconstruction, multi-view image has problems such as image calibration, high computational complexity and difficulty in dealing with dynamic scenes [9]. Furthermore, current research on the 3D reconstruction of monocular images mainly focuses on the 3D reconstruction of indoor and outdoor scenes, and its application in road engineering still needs further research.

In recent years, with the continuous updating of machine learning algorithms, deep neural networks have played a crucial role in image classification [10], scene semantic segmentation [11], image super-resolution [12], target detection and recognition [13] and depth prediction owing to their strong feature learning capabilities. On this basis, in order to improve the accuracy of predicting depth maps from monocular images, researchers have carried out many studies from the perspectives of network structure [14], loss function [15], problem transformation [6] and have led to the proposal of numerous novel algorithms. Significant breakthroughs have been made in monocular image depth estimation based on deep learning and the convolutional neural network algorithm. However, at present, the monocular visual neural network model and method for the 3D reconstruction of asphalt pavement macro-texture are still far from optimal, and further research is urgently required.

In view of this, the primary objective of this paper is to develop a novel CNN architecture for the macrotexture reconstruction of asphalt pavements based on monocular images, which in turn enables the detection of pavement skid resistance. Firstly, different gradation types of asphalt mixtures are used to establish asphalt macrotexture RGB-D dataset, a new encoder–decoder network architecture is designed to recover the depth information from asphalt macro-texture 2D images, and a series of training strategies are proposed to optimize the model. Secondly, the accuracy of the reconstructed texture is evaluated using MAE, RMSE, REL and Accuracies. Finally, the effectiveness of the reconstructed texture is analyzed based on MTD and f₈ indicators, and the reconstructed texture is shown to be directly applicable for the evaluation of pavement skid resistance. The flowchart is shown in Figure 1.

2. Methods

2.1. Data Set

A comprehensive dataset was established incorporating four asphalt mixture types: asphalt concrete (AC), stone mastic asphalt (SMA), open-graded friction course (OGFC), and porous asphalt (PA). The experimental matrix encompassed 14 distinct aggregate gradations with varied nominal maximum aggregate sizes (NMAS) [16].

1.: Four NMASs, which were 4.75, 9.5, 12.5, and 16 mm, were selected for the AC mixture (AC-5, AC-10, AC-13, and AC-16).
2.: Three NMASs, except for 4.75 mm, were used for the SMA and OGFC mixture (SMA-10, SMA-13, SMA-16, OGFC-10, OGFC-13, and OGFC-16).
3.: Four NMASs, which were 9.5, 12.5, 16, and 19 mm, were selected for the PA mixture (PA-10, PA-13, PA-16, and PA-20).

2.1.1. Acquisition of Asphalt Macro-Texture Data Set

This study employed the LTS 9400 laser texture scanner developed by AMES Engineering (Ames, IA, USA) to acquire macro-texture point cloud depth data of asphalt pavement.. The device generates a complete 3D model by measuring the time elapsed from laser emission to reception, along with the intensity of the reflected laser beam. It is capable of scanning an area of 101.6 × 71.5 mm with a vertical resolution of 0.0246 mm. Specific parameters are detailed in Table 1.

To enhance the quality and efficiency of data collection, the test surface was cleaned prior to sample acquisition and positioned in a relatively flat test area. A light shield was used to divide each test surface into 9 distinct sections, with each section measuring 103 × 73 mm. To further improve the robustness of the network model, the camera was placed randomly at distances ranging from 140 mm to 200 mm above the test surface during sample acquisition to obtain RGB images of different sizes and resolutions, thus increasing the diversity of the data. Subsequently, the acquired 2D images were cropped to match the dimensions of the corresponding depth images from the point cloud. This ensured precise alignment between the asphalt macrotexture 2D images and the depth maps. The process for establishing the asphalt pavement macrotexture dataset is illustrated in Figure 2.

2.1.2. Point Cloud Data Preprocessing

Due to the irregularity in the texture of the asphalt surface, the light intensity in certain depressed areas falls below that of the threshold that can be recognized by the laser texture scanner. These regions fail to provide accurate depth information, leading to the loss of some data in the acquired point cloud. This data gap affects the precise representation of the macro texture of the asphalt pavement. Compared to other methods, linear interpolation offers advantages, such as fast computation, broad applicability, and simplicity, making it an effective approach for filling in missing point cloud data. Therefore, to mitigate the impact of missing points on the reconstruction results, this paper employs the linear interpolation method for data imputation, as shown in Equation (1):

Z_{i, j} = \frac{Z_{n j} - Z_{m j}}{n - m} (i - m) + Z_{m j}

(1)

where i represents the number of missing points, j represents the jth texture contour, m represents the number of the nearest effective point before i, n represents the number of the effective value point closest to I, z_ij represents the value of the interpolation of the point i in the extracted texture contour, z_nj represents the value of n in the sample of texture contour j, and z_mj represents the value of sample m in the texture contour j.

In addition, the spikes observed in the 3D point cloud data of the road surface are abnormal points with significant amplitude, which can introduce errors in the calculation results and negatively affect the computation, such as texture characterization of asphalt pavement. Therefore, this paper adopts the bilateral filtering algorithm to remove the noise. The bilateral filtering algorithm constructs a surface using noisy points and their local neighborhood points, shifts the noisy points along the normal direction, and achieves a smoothing and denoising effect by iteratively adjusting the positions of the noisy points. Compared with other methods, this approach offers advantages such as strong adaptability, broad applicability, and reduced artifacts. The results before and after denoising are shown in Figure 3, and the bilateral filtering formula is shown in Equation (2)

B_{j}^{'} = B_{j} + α \cdot n_{j}

(2)

where B_j represents the 3D coordinates of the point before filtering; α represents the filtering factor of bilateral filtering; n_j denotes the normal vector of B_j; and b_j′ represents the three-dimensional coordinates of the point after bilateral filtering.

In the process of collecting point cloud data for asphalt surface texture, it is challenging for the laser texture scanner to be perfectly parallel to the surface of the specimen, leading to a tilt in the overall texture data. This tilt causes significant deviations between the calculated results and the actual values, which in turn affects the accurate evaluation of the surface texture. To address this, this paper proposes a planar model that accounts for the inclination of the pavement texture. By calculating the deviation between the projection of each point on the plane model and its original position, the model corrects the inclined texture data. Commonly used plane fitting methods include the least squares method and the eigenvalue method [17]. The least squares method, however, only considers errors in the elevation (z-direction) and results in unstable corrections across different coordinate systems [18]. Therefore, this paper adopt the eigenvalue method to fit the plane that reflects the inclination of a pavement texture. The method fits a plane of ax + by + cz = d by solving the normal vector {a, b, c} and the parameter d, so that the distance from the 3D point cloud (x_j, y_j, z_j) (j = 1, 2, 3… n − 1) to the plane is minimized. Since the normal vector {a, b, c} needs to meet the condition a² + b² + c² = 1, the problem can be transformed into minimizing the function f using the Lagrange multiplier method [19], as shown in Equation (3).

f = {\sum_{j = 1}^{n - 1} [a x_{j} + b y_{j} + c z_{j} - d]}^{2} - λ (a^{2} + b^{2} + c^{2} - 1)

(3)

where λ is the Lagrange multiplier; j is the number of 3D point cloud data points; and n is the total number of three-dimensional point cloud data.

It can be observed from the above discussion that the plane fitted by the eigenvalue method along the three directions of x, y and z can maintain good stability in different coordinate systems. Therefore, this paper employs the eigenvalue method to address the tilt issue in the 3D point cloud data. The steps are as follows:

First of all, the partial derivative of Formula (3) to d is obtained, and its derivative is 0. Then

d = a (E x_{j}) + b (E y_{j}) + c (E z_{j})

(4)

where E_xj, E_yj, E_zj is the average value of each 3D point x_j, y_j, z_j; (E_xj, E_yj, E_zj) is the mean matrix of the point (x_j, y_j, z_j).

The coordinates of each 3D data point were subtracted from the mean vector to obtain the centered data matrix. This matrix was then transposed and multiplied, with the results of all the sample data summed to calculate the covariance matrix (Cov), as shown in Equation (5):

C o v (x_{j}, y_{j}, z_{j}) = \sum_{j = 1}^{n} [\begin{matrix} (x_{j} - E x_{j}) (x_{j} - E x_{j}) & (x_{j} - E x_{j}) (y - E y_{j}) & (x_{j} - E x_{j}) (z_{j} - E z_{j}) \\ (x_{j} - E x_{j}) (y - E y_{j}) & (y - E y_{j}) (y - E y_{j}) & (y - E y_{j}) (z_{j} - E z_{j}) \\ (x_{j} - E x_{j}) (z_{j} - E z_{j}) & (y - E y_{j}) (z_{j} - E z_{j}) & (z_{j} - E z_{j}) (z_{j} - E z_{j}) \end{matrix}]

(5)

The minimum eigenvalue λ_min of the covariance matrix Cov (x_j, y_j, z_j) and its corresponding eigenvectors (a, b, c) are solved. The eigenvector is substituted into Formula (5) to obtain the parameter d and determine the fitted plane equation ax + by + cz = d. Finally, the 3D point cloud coordinates are corrected:

Z_{j} = z_{j} + \frac{a}{c} x_{j} + \frac{b}{c} y_{j} + \frac{d}{c}

(6)

where Z_j is the jth three-dimensional point cloud coordinates after correction, and z_j is the jth three-dimensional point cloud coordinates before correction.

The macroscopic texture depth maps of the asphalt before and after correction are shown in Figure 4:

Because the collected depth map point cloud data is too fine to accurately reconstruct the asphalt texture, the macroscopic texture depth map of the asphalt is sampled to a resolution of 208 × 144 to reduce the burden of subsequent computer processing and modeling and improve processing efficiency.

2.1.3. Image Augmentation

To enhance the model’s generalization ability, 365 sets of cropped RGB images were randomly divided into training samples (255 groups), test samples (73 groups) and verification samples (37 groups) in a 7:2:1 ratio. The three sample types were augmented using techniques such as mirror flipping, horizontal flipping, vertical flipping, random brightness adjustment, and noise perturbation. The results are shown in Figure 5, yielding a total of 2190 image pairs.

2.2. Introduction of Network Architecture

2.2.1. Introduction of CNN and CNN-Based Depth Estimation

CNNs mimic the visual cortex system of animals to recognize object features in images. In CNNs, filters act as cells simulating the visual cortex, while stacked convolutional layers capture different receptive fields, enabling the network to identify target features and their spatial locations. Over the past few years, CNNs have rapidly advanced and become widely used in various computer vision tasks due to their superior accuracy and stability in feature extraction compared to traditional methods. This section introduces the basic theory of CNNs, including key components, such as the convolutional layer, pooling layer, and activation layer.

The convolutional layer consists of multiple convolutional kernels, each a fixed-size matrix with adjustable parameters. These kernels slide across the feature map, moving from left to right and top to bottom with a specified stride, covering local regions to extract relevant features. Within each region, the kernel performs a weighted sum of the data, producing a feature map for that area. The convolutional kernel focuses on local feature extraction and maintains constant weights during the sliding process, which enables local connectivity and weight sharing. The convolutional computation can be expressed by the following equation:

S (i, j) = (I * K) (i, j) = \sum_{m} \sum_{n} I (i + m, j + n) K (m, n) + b

(7)

where the output image S(i, j) is called the feature map; the elements in the convolution kernel K are called the weights, and b is called the bias; in addition, the number of pixels s that the convolution kernel moves each time in the input image is called the step size.

The pooling layer performs a pooling operation on the input data, reducing its size while retaining as much relevant information as possible. Pooling aims to compress data: it suppresses low-response signals and reduces noise, while also decreasing the number of parameters to be learned, thereby reducing the network’s size. Additionally, pooling enlarges the receptive field, enabling the use of smaller convolutional kernels to capture larger-scale features.

The key parameters in the pooling operation are the pooling kernel and stride. Typically, the pooling kernel is square to ensure equal sampling in both dimensions. Given an input image with dimensions I_w × I_h, a pooling kernel of size k × k, and a stride s, the output image dimensions after pooling can be calculated as follows:

\{\begin{matrix} W_{output} = floor (\frac{I_{w} - k}{s} + 1) \\ H_{output} = floor (\frac{I_{h} - k}{s} + 1) \end{matrix}

(8)

The pooling layer parameters are hyperparameters, manually selected and not learned during training. The most common values for the pooling kernel size and stride are both set to 2, typically for non-overlapping pooling. Based on the operation applied within the pooling kernel, the two main types of pooling are max pooling and average pooling.

The function in the activation layer that performs nonlinear mapping is called the activation function. It must be nonlinear, though not all nonlinear functions are suitable for this purpose. For efficient backpropagation in neural networks, the activation function must be differentiable and should avoid issues such as gradient explosion or vanishing gradients. In convolutional neural networks, the activation layer typically follows the convolutional layer, with its input being a two or multi-dimensional matrix. The activation function is applied element-wise to the input matrix, without altering its dimensionality. Common activation functions include Sigmoid, Tanh, ReLU, and other ReLU-like functions.

The Sigmoid function, also known as the Logistic function, is mathematically expressed as

σ (x) = \frac{1}{1 + \exp (- x)}

(9)

The Tanh function, also known as the hyperbolic tangent function, is similar to the Sigmoid function. However, its output range is shifted from (0, 1) to (−1, 1), and the derivative range is extended to (0, 1). The mathematical expression for the Tanh function is

Tanh (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}

(10)

The ReLU (Rectified Linear Unit) function, also known as the Modified Linear Unit or Linear Rectifier, is a piecewise function with the following mathematical expression:

R e c t i f i e r = \{\begin{array}{l} x, x \geq 0 \\ 0, other \end{array}

(11)

ReLU-Like FunctionAlthough the ReLU activation function is not flawless, its advantages make it widely adopted. To address the “neuron death” problem in ReLU, researchers have proposed several improvements, primarily focusing on modifying the behavior for x < 0.

The Leaky ReLU function modifies the x < 0 region by using a small, non-zero slope instead of setting it to zero. Its mathematical expression is

f (x) = m a x (a x, x)

(12)

where α is a constant. When α is set too small, the function is close to the ReLU function and does not work; when α is set large, it loses the ability to sparsify the input data and weakens the nonlinearity of the function.

(1): PReLU function

To improve the selection of α, the PreLU (Parametric ReLU) function was proposed. In this function, α is treated as a learnable parameter that is updated during training to adapt to the data and the network.

(2): RReLU function

The RReLU (Randomized Leaky ReLU) is a variant of the Leaky ReLU function. During network training, it generates a random α from a Gaussian distribution for each activation, which is fixed during testing.

(3): ELU function

None of the three variants above address the issue of producing zero-mean outputs. As a result, the ELU (Exponential Linear Unit) function was introduced, with the following mathematical expression:

f (x) = \{\begin{array}{l} x, x \geq 0 \\ α (e^{x} - 1), other wish \end{array}

(13)

The ELU function employs a nonlinear expression with soft saturation for x < 0. Moreover, its output approaches zero for negative values of x, which aids in accelerating convergence during training.

2.2.2. Encoder–Decoder Network Architecture

The monocular visual neural network proposed in this paper consists of an encoder–decoder deep neural network. The network model is composed of four modules: encoder, decoder, feature fusion module and refinement modules, as shown in Figure 6.

Taking the macroscopic texture depth map of asphalt as the supervisory signal, the RGB image is directly input into the CNN model, and the network structure is shown in Table 2.

The encoder uses an improved ResNet101 as the feature extractor to downsample the input image, extract details and multi-scale features at different resolutions, and pass the feature map output by the final downsampling module to the decoder. The encoder consists of 33 residual blocks, each including convolutional layers with sizes of 1 × 1, 3 × 3, and 1 × 1. It employs skip connections to link the input feature signal with the output feature after convolution. This method preserves the original features while incorporating deeper, more complex features. The residual structure is shown in Figure 7a.

The encoder of the proposed CNN differs from ResNet101 in several ways. Specifically, the 3 × 3 convolutional layers in the last residual block (Residual Block 4) is replaced with a dilated convolution layer with a dilation rate of 2, which increases the resolution of the output feature map from 1/32 of the input image to 1/16. Additionally, the final average pooling layer, fully connected layer, and SoftMax layer are removed, and Residual Block 4 is directly connected to the decoder.

The decoder consists of one convolutional layer and four up-sampling layers, labeled Conv 2, Up1, Up2, Up3 and Up4, respectively. The feature map doubles in size with each upsampling operation. After four upsampling steps, the output feature map has dimensions of 208 × 144 × 64. Using a standard encoder-decoder network typically results in the loss of complex texture information [20]. To address this, a refinement module is introduced in this paper. First, edge information from the 2D image is extracted using the Sobel operator. Next, a 3 × 3 convolutional layer (Conv 4) is used to embed the edge features of the pavement texture image into the refinement module. Finally, these edge features are integrated with the feature map produced by the feature fusion layer and the decoder through the Conv 5, Conv 6, and Conv 7 convolutional layers. The final depth map is then output.

The feature fusion module includes four upsampling layers: Up5, Up6, Up7, and Up8. The feature maps at four different scales, generated by the encoder’s downsampling, are upsampled by factors of 2, 4, 8, and 16, respectively. Each output feature map has dimensions of 208 × 144 × 16. Channel transformation is then applied via convolutional layers to produce the output of the feature fusion layer.

In this model, except for the last convolutional layer (Conv 7), a batch normalization module and an activation function are applied after every convolutional layer, dilated convolutional layer, and transposed convolutional layer, as shown in Figure 7b.

To evaluate the performance of the incorporated feature fusion module and refinement module, the CNN model was trained and tested on the RGB-D dataset alongside the original model. The testing results are illustrated in the accompanying figure.

As depicted in Figure 8, as the number of iterations increases, the loss values of both the ResNet model and the improved model exhibit three distinct phases: a rapid decline, a gradual decline, and a stabilization phase. Notably, the evaluation metrics of the model enhanced with the feature fusion and refinement modules consistently outperform those of the original model across all iteration stages. Furthermore, the improved model demonstrates smaller curve fluctuations compared to the original model, indicating a more stable overall detection performance.

2.3. Training Strategy

2.3.1. Loss Function

In image regression, the commonly used loss function is the BerHu function, which combines the mean square error and absolute error loss functions, contributing to more stable network training [21]. The BerHu loss consists of two parts: the L₁ norm loss and the L₂ norm loss. When the difference between the predicted and tactual values is small, the L₁ norm is applied; when the difference is large, the L₂ norm is used. Based on this, this paper employs the BerHu loss function to measure the difference between the predicted and actual values, as shown in Equation (14):

L^{D} (y_{i} - y_{i}^{'}) = \{\begin{array}{l} |y_{i} - y_{i}^{'}|, |y_{i} - y_{i}^{'}| \leq δ \\ \frac{{(y_{i} - y_{i}^{'})}^{2} + δ^{2}}{2}, other wish \end{array}

(14)

where y_i, y_i′ denotes the predicted value and the true value, respectively, δ = 0.2max|y_i − y_i′|.

In order to improve the performance of the model, the Adam algorithm was used to optimize ikt. In addition, 100 epochs were set during the training process; each epoch contains 60 mini-batches, with a starting learning rate ε of 0.01, a decay rate ρ of 0.999, and a momentum coefficient μ of 0.9. Meanwhile, in order to improve the efficiency of the training, the training strategy of parameter initialization was used to accelerate the convergence speed of the model.

2.3.2. Transfer Learning

Transfer learning is a machine learning technique that involves transferring knowledge gained from one domain to a related domain to improve learning performance. This approach can significantly enhance the learning efficiency of new tasks, reduce training costs, and improve the model’s generalization ability. Due to limitations in time and resources, even with the expansion of the original data set to 2190 pairs, the data remains relatively small compared to established datasets. Therefore, this paper adopts transfer learning using the NYU Depth V2 dataset.

The publicly available NYU-Depth V2 dataset is derived from video sequences of various indoor scenes, which are recorded by Microsoft Kinect’s RGB and depth cameras. A total of 2413 images pairs from the NYU-Depth V2 dataset were used in this study. The depth values range from 0.5 to 10 m, and the dataset was augmented to 9652 image pairs. Of these, 6756 image pairs were used for training, 1930 pairs for testing, and 966 pairs for validation. Additionally, the resolution of the RGB-D images was downsampled to 208 × 144 to ensure compatibility with the proposed CNN model.

The CNN is trained on NYU-Depth V2 dataset for 100 epochs using a segmented constant learning rate decay method, with an initial learning rate set to 0.01. The learning rate is reduced by a factor of 0.1 every 1500 iterations. Pre-trained network weights from the NYU-Depth V2 dataset are used as the initial weights. Finally, the CNN is fine-tuned on the pavement texture RGB-D dataset.

2.4. Reconstruction of 3D Pavement Macro Texture Evaluation

2.4.1. Accuracy Evaluation Index

In the monocular image depth estimation task, the commonly used accuracy evaluation indicators are Root Mean Squared Error (RMSE), Mean Relative Error of the Last (REL), Mean Absolute Error (MAE), Accuracies, etc. [22]. The formula is shown in Equations (15)–(18):

RMSE = \sqrt{\frac{1}{|N|} \sum_{i \in N} ‖d_{i} - d_{i}^{*}‖}

(15)

REL = \frac{1}{N} \sum_{i = 1}^{N} \frac{|d_{i} - d_{i}^{*}|}{d_{i}^{*}}

(16)

MAE = \frac{1}{N} \sum_{i = 1}^{N} |d_{i} - d_{i}^{*}|

(17)

m a x (\frac{d_{i}}{d_{i}^{*}}, \frac{d_{i}^{*}}{d_{i}}) = δ < t h r, t h r = \{1.25, {1.25}^{2}, {1.25}^{3}\}

(18)

where d_i denotes the predicted value of pixel; d_i* denotes the true depth value of pixel i; and N represents the sum of pixels in the image.

Smaller values of RMSE, REL, and MAE indicate better model performance, as the reconstructed texture more closely matches the real texture. A larger accuracy value signifies that a greater number of point pixels have predicted depth values within the specified error range, thereby reflecting a higher accuracy of the obtained depth map.

2.4.2. Effectiveness Evaluation

Fourteen different types of asphalt mixture test plates were used as research objects. Four scanning positions were randomly selected for each test plate for the calculation of MTD and f₈. 56 sets of experimental data were obtained for the validity analysis of the reconstructed asphalt macro-texture. If the reconstructed texture features have a significant correlation with the measured features, this indicates that the reconstructed macro-texture structure is consistent with the actual pavement macro-structure [23].

The evaluation metrics for the effectiveness of pavement macro-texture include arithmetic average height, root mean square height, kurtosis, skewness, and average texture depth. Miao et al. [24] studied 14 feature indexes of the Grey Level Co-occurrence Matrix (GLCM) and demonstrated that the f₈ (Sum Entropy) feature index has a strong correlation with MTD. Therefore, this paper uses MTD and f₈ to evaluate the effectiveness of the reconstructed asphalt macro-texture. The Grey Level Co-occurrence Matrix refers to the probability that the gray level of a pixel at a distance of (d_x, d_y) from a pixel with a gray level of i is j, as shown in Formula (19). The calculation formulas for f₈ are provided in Formulas (20) and (21):

P (i, j | d, θ) = # \{(x, y) | f (x, y)\} = i, f (x + d x, y + d y) = j, x, y = 0, 1, 2 \dots, N - 1

(19)

where d is the spatial distance between two pixels; this paper takes 1; θ is the direction considered when sampling the image; this paper takes 90°; # is a collection; i, j = 0, 1, 2…, L − 1:

f_{8} = - \sum_{k = 2}^{2 N_{g}} P_{x + y} (k) \log \{p_{x + y} (k)\}

(20)

P_{x + y} (k) = \sum_{\begin{array}{l} i = 1 \\ i + j = k \end{array}}^{N_{g}} \sum_{j = 1}^{N_{g}} P_{d}^{θ} (i, j)

(21)

where P_d^θ (i, j) is the probability of the occurrence of the pixel pair (i, j); N_g is the maximum gray level of gray image.

The average structural depth is measured using the sanding method. As shown in Figure 9, a measuring cylinder with a capacity of 40 cm³ is filled with dry sand. Four scanning positions are randomly selected, and the dry sand is evenly spread over a rectangular area of the paving plate. The remaining volume of dry sand, V′, in the measuring cylinder is recorded after covering the unmeasured area. The ratio of the volume of sand used to cover the rectangular area to the coverage area gives the average structural depth of the area. The calculation formula is shown in Equation (22):

M T D = \frac{(40 - V^{'}) \times 1000}{L \times W}

(22)

where MTD represents the average texture depth of asphalt pavement; V’ represents the remaining amount of sand in the sand barrel; L, W denotes the length and width of the dry sand coverage area.

Through the proposed CNN model, the 3D model of asphalt macro-texture can be reconstructed, and the average texture depth of asphalt macro-texture can be directly calculated by using the pavement elevation data in the 3D model. The calculation formula is shown in Equation (23):

M T D^{'} = \frac{\sum_{j = 1}^{M} \sum_{i = 1}^{N} (Z_{\max} - ε - Z_{i j}) \times s}{S}

(23)

where MTD′ is the average construction depth of macro texture calculated by using 3D model; S is the area of the reconstructed texture corresponding to the dry sand coverage area; s is the area of each scanning point; i is the number of pixels of a texture contour; j is the number of texture contours; Z_max is the maximum depth value of the reconstructed texture; z_ij is the depth value of the i th pixel in the j th texture contour; ɛ is the correction parameter of Z_max, and the value in this paper is 0.8.

3. Results and Discussion

3.1. Performance on Different Data Sets

To demonstrate the superiority of the proposed CNN in this paper over other network models, the performance of the CNN trained on the NYU Depth V2 dataset and asphalt macro-texture RGB-D datasets were compared with some publicly available research results, as shown in Table 3.

According to Table 3, the RMSE, REL, and accuracies for δ = 1, δ = 2, and δ = 3 on the NYU Depth V2 dataset for the proposed CNN are 0.630, 0.135, 0.801, 0.951, and 0.986, respectively. For the asphalt macro-texture dataset, the corresponding values are 0.491, 0.102, 0.931, 0.979, and 0.991. Compared to other models, the proposed CNN achieves lower errors and higher accuracy, indicating its superior performance for accurate asphalt macro-texture reconstruction. Additionally, these results demonstrate that the established RGB-D asphalt pavement texture dataset is effective and suitable for training the network model.

3.2. Comparison of Different Training Results

A comparative visualization of the training trajectories for the proposed CNN architecture with and without transfer learning implementation is presented in Figure 9.

The proposed CNN was trained and tested on the asphalt macro-texture RGB-D dataset, and the change in the loss value with respect to the number of iterations is shown in Figure 10. The results indicate that the loss values of this model generally show three stages of rapid decrease, slow decrease, and levelling off, and by comparing Figure 10, it is found that good convergence results are achieved for both training strategies with and without migration learning. The baseline CNN architecture achieves convergence at epoch 20, whereas its transfer learning-enhanced counterpart requires 30 epochs to stabilize. Through analysis, the main reasons for this phenomenon are as follows: there is a big difference in the depth distribution between the NYU Depth V2 data set and the asphalt macro-texture RGB-D data set created in this paper, and the units of the two data sets are not in the same order of magnitude. In addition, when the domain differences between the two tasks are large, the knowledge learned in transfer learning may not be applicable to the new task, which has a negative impact on the training effect [29]. Thus, the strategy without transfer learning was selected in the following study.

3.3. Evaluation of Macroscopic Texture Accuracy of Asphalt After Reconstruction

The experimental protocol incorporates stratified subgroup analysis of the road texture RGB-D test set, categorized by NMAS and aggregate gradation classifications, to systematically evaluate the CNN’s predictive capability across asphalt mixture typologies. As depicted in Figure 11, the quantitative evaluation metrics (MAE, RMSE, REL) reveal distinct performance patterns among gradation categories. Experimental data analysis reveals distinct gradient characteristics in prediction accuracy among different types of asphalt mixtures. Both dense-graded AC and SMA mixtures demonstrate superior predictive performance: AC mixtures maintain stable MAE below 0.7, RMSE consistently under 1.18, and REL strictly controlled within 0.85. SMA mixtures exhibit even more stable predictive capability, with average MAE reaching 0.68, all RMSE values below 1.1, and REL stably around 0.85. In comparison, OGFC mixtures show slightly reduced prediction accuracy. While MAE remains below 0.8 and RMSE does not exceed 1.18, their REL approaches 1. Notably, PA mixtures exhibit significantly higher error metrics across the board—MAE fluctuates between 0.9 and 1.35, RMSE ranges from 1.4 to 1.8, and REL reaches as high as 1.3–1.7. This comprehensive performance gap clearly highlights the inherent limitations of PA mixtures in terms of prediction model applicability.

The main reasons for this phenomenon are as follows: on the one hand, AC and SMA-type asphalt mixtures are made of aggregates with smaller pores, whereas OGFC and PA-type asphalt mixtures are made of aggregates with larger pores, especially PA mixtures. The design porosity is greater than 18%, and the proportion of fine macro-texture is relatively small, which leads to CNN being more inclined to extract coarse macro-texture features during training and to ignore the extraction of fine macro-texture features. On the other hand, the number of PA-type asphalt mixture data samples is relatively small compared with other low-porosity mixtures, making the number of CNNs constructed when training PA-type samples lower than for the other three types of asphalt mixtures. In summary, the proposed CNN in this paper can accurately reconstruct the macroscopic texture of AC, SMA and OGFC but, when reconstructing the macroscopic texture of PA, the accuracy is slightly lower than that of the other three types of mixture.

In order to show the reconstruction effect of the proposed CNN on the macroscopic texture of asphalt more intuitively, Figure 11 lists the reconstruction results of four different types of asphalt mixtures. Among them, Figure 12a is the input two-dimensional image, Figure 12b is the reconstructed macro texture depth map, and Figure 12c is the collected macro texture depth map of asphalt. It can be seen from Figure 12 that the proposed CNN in this paper can accurately reconstruct the macro texture of AC, SMA, OGFC and PA asphalt mixture.

3.4. Pavement Performance Evaluation

The correlation analysis between the MTD value of the reconstructed asphalt macro-texture and the results measured by the sanding method and the f₈ index is shown in Figure 13.

Figure 13a shows the correlation between the MTD measured by the sanding method and the MTD values of the asphalt macro-texture reconstructed using the proposed CNN. The MTD values for both methods are closely aligned along the line y = 0.8417x + 0.4719, with a correlation coefficient of R = 0.9131. Figure 13b presents the correlation between the reconstructed macro-texture’s f₈ feature and the actual measured f₈. The figure illustrates that the reconstructed f₈ has a strong correlation with the actual measured f₈, with a correlation coefficient of R = 0.9531. These results demonstrate that the 3D macro-texture with a resolution of 208 × 144 meets the requirements for pavement condition evaluation. The characteristics of the reconstructed pavement macro-texture can be used effectively to assess the corresponding pavement texture. Compared to the traditional sanding method, the proposed 3D reconstruction approach based on a monocular visual depth network is more applicable. Thus, the CNN-based asphalt pavement macro-texture reconstruction method can partially replace existing methods for asphalt macro-texture reconstruction.

4. Case Study

4.1. Field Test Comparison Test of Asphalt Pavement

To systematically evaluate the reconstruction performance of the proposed CNN model for asphalt pavements with different gradation types, four typical main roads in Zhangdian District, Zibo (Liuquan Road, Xincun West Road, West Second Road, and Gongqingtuan Road) were selected as the study subjects. These roads encompass three types of asphalt pavement structure: dense-graded (0.5–1.5 mm), semi-open-graded (1.5–2 mm), and open-graded (2–3.0 mm). Based on the collected pavement image data, a three-dimensional reconstruction of asphalt surface texture was performed, and the results were compared with sand patch test measurements to comprehensively validate the applicability and reliability of the model. A 500-m section on each of the three roads was selected as the analysis area for comparing the reconstructed results with measured values. Measurement points were arranged at 10-m intervals along the driving trajectory, resulting in a total of 50 measurement points per section.

High-resolution industrial-grade cameras were employed for image acquisition to ensure exceptional clarity and detailed representation. The captured images were uniformly cropped to a resolution of 208 × 144 pixels, a size selected to retain sufficient detail while effectively managing data volume, thereby mitigating excessive computational overhead.

The preprocessed images were input into the proposed CNN model for three-dimensional reconstruction. The reconstructed results were then compared with the Mean Texture Depth measured via the sand patch test, and the reconstruction outcomes for different gradation types are summarized in Table 4, Table 5 and Table 6.

According to the test results in Table 4, the Relative Error (REL) of the MTD field measurements was 7.64%, and the RMSE was 5.8%. At thresholds of 1, 2, and 3, the detection accuracies reached 93.5%, 95.3%, and 98.1%, respectively. Both the error and accuracy of the MTD′ and MTD detection results met the expected requirements, demonstrating that the proposed CNN model achieves high accuracy and reliability in reconstructing the texture of dense-graded asphalt pavements.

According to the test results in Table 5, the maximum REL of the Mean Texture Depth field measurements was 5.42%, and the RMSE was 5.5%. At thresholds of 1, 2, and 3, the detection accuracies reached 94.03%, 95.97%, and 98.89%, respectively. Both the error and accuracy of the MTD′ and MTD detection results met the expected requirements, demonstrating that the proposed CNN model achieves high precision and reliability in texture depth reconstruction for asphalt pavements.

As shown in Table 6, the maximum REL for the MTD field measurements was 5.15%, and the RMSE was 6.9%. At thresholds of 1, 2, and 3, the detection accuracies reached 92.76%, 95.02%, and 97.86%, respectively. While the results met the detection requirements, the errors were higher compared to those of dense-graded and semi-open-graded asphalt pavements.

In field measurements, image acquisition requires the camera to remain perpendicular to the inspection area. Unlike controlled laboratory environments, road surfaces in practical engineering often exhibit slight inclinations. While image preprocessing and stereo rectification techniques effectively mitigate the impact of these inclinations on detection results, residual errors persist. Consequently, depth detection values derived from the 3D pavement model are slightly higher than those obtained via the sand patch test, as illustrated in Figure 14.

To visually demonstrate the macrotexture reconstruction capability of the proposed CNN model, Figure 15, Figure 16 and Figure 17 present the reconstructed results for dense-graded, semi-open-graded, and open-graded asphalt pavements. The results indicate that the CNN model can effectively reconstruct the macrotexture of real-world asphalt pavements, with reconstruction accuracy fully complying with the detection requirements.

4.2. Field Testing of Skid Resistance

Pavement skid resistance is typically evaluated through sand patch testing and pendulum friction testing, where comprehensive analysis is conducted based on MTD, BPN, and relevant specifications/empirical criteria. While the MTD measurement methodology has been detailed previously, the following briefly explains the BPN testing procedure.

In China, the British Pendulum Tester is employed to measure pavement skid resistance under wet conditions, yielding the BPN friction coefficient. As environmental temperature significantly affects skid resistance measurements, field-obtained BPN values must be corrected using Equation (24) to derive standardized values (F_B₂₀) at the reference temperature of 20 °C [30]:

F_{B 20} = F_{B T} + Δ F

(24)

where F_B₂₀: BPN converted to standard temperature of 20 °C, F_BT: Actual BPN measured at pavement surface temperature T, T: Measured pavement surface temperature (°C) under wet conditions, ΔF: Temperature correction factor (selected from Table 7.)

The British Pendulum Tester was utilized to conduct skid resistance measurements across all 50 predetermined test sections. Field testing was performed at an ambient pavement temperature of 10 °C, with the on-site measurement conditions visually documented in Figure 18.

The British Pendulum Test results for pavement surface friction are presented in Table 8.

4.3. Analysis of Relationship Between MTD and Skid Resistance of Asphalt Pavement

To investigate the correlation between the 3D-reconstructed MTD’ and FB20, least squares regression analysis was performed on the paired datasets. The derived relationship is expressed in Equation (25), where the independent variable x represents MTD:

F_{B 20} = 3.055 x + 56.622

(25)

The relationship between the two datasets from Table 4 and Table 7 is graphically presented in Figure 19 and Figure 20.

The correlation coefficient between the 3D model’s MTD and the pendulum test results is 0.71, indicating a certain positive linear correlation. Generally speaking, the greater the texture depth, the higher the friction coefficient measured by the pendulum tester, demonstrating better skid resistance of the asphalt pavement.

Based on the analysis of MTD and friction coefficient, the MTD values are mostly concentrated between 1.50 mm and 2.50 mm, with only a few measurement points showing values that are too low (<1.30 mm) or too high (>2.50 mm). Most measurement points have F_B₂₀ values between 62 and 67, while a few points show lower friction coefficients (F_B₂₀ < 60).

In summary, the skid resistance performance of this road section is generally good, with most measurement points’ MTD values and friction coefficients falling within relatively optimal ranges. However, a few measurement points (such as Points 10, 41, and 42) show potential skid resistance deficiencies that require special attention and improvement.

5. Conclusions

This study proposes a CNN-based monocular image approach for three-dimensional reconstruction of asphalt pavement macrotexture. The core innovation lies in developing a CNN model that reconstructs pavement macrotexture from two-dimensional images. To validate the model’s accuracy and effectiveness, comparative experiments were conducted against state-of-the-art methods using both our proprietary dataset and public benchmark datasets. Systematic evaluation demonstrates the proposed method’s superior reconstruction precision and practical value for asphalt pavement macrotexture analysis.

(1): The macroscopic texture RGB-D data set of asphalt pavement constructed in this paper can be directly used for model training. At the same time, the proposed CNN model shows excellent performance compared with other models on the NYU Depths V2 public data set.
(2): The macroscopic texture of the pavement reconstructed by the CNN constructed in this paper can be directly used for the detection of asphalt pavement. The proposed macro-texture depth map meets the technical requirements of pavement skid resistance test in terms of resolution (208 × 144 mm) and measurement accuracy.
(3): Compared with the traditional sanding method, this method shows significant advantages in reconstruction efficiency and engineering applicability, and can be used as an effective supplement to the existing macroscopic texture detection system for asphalt pavement.
(4): The current study primarily focuses on 3D macrotexture reconstruction of asphalt pavements under static conditions, while its application in dynamic scenarios (e.g., vehicle-mounted mobile measurement systems) has not yet been explored. Future research will specifically address three critical challenges in dynamic environments: (i) motion blur compensation, (ii) viewpoint variation handling, and (iii) real-time processing requirements. These will be tackled through the implementation of temporal modeling architectures (particularly recurrent neural networks) coupled with advanced motion compensation algorithms, ultimately enabling high-fidelity texture reconstruction for mobile measurement applications.

Author Contributions

Conceptualization, C.Y.; data curation, X.L.; methodology, X.L.; supervision, C.Y.; writing—original draft, X.L.; writing—review and editing, C.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (Grant No. 51808327) and Natural Science Foundation of Shandong Province (Grant No. ZR2019; PEE016).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request. The data are not publicly available due to privacy and anonymity.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Dong, N.; Prozzi, J.A.; Ni, F. Reconstruction of 3D pavement texture on handling dropouts and spikes using multiple data processing methods. Sensors 2019, 19, 278. [Google Scholar] [CrossRef] [PubMed]
ISO13473-1-1997b; Characterization of Pavement Texture by Use of Surface Profiles Part 1: Determination of Mean Profile Depth. ISO: Geneva, Switzerland, 1997.
Yu, M.; You, Z.; Wu, G.; Kong, L.; Liu, C.; Gao, J. Measurement and modeling of skid resistance of asphalt pavement: A review. Constr. Build. Mater. 2020, 260, 119878. [Google Scholar] [CrossRef]
Li, Q.; Zou, Q.; Zhang, D. Road Pavement Defect Detection Using High Precision 3D Surveying Technology. Geomat. Inf. Sci. Wuhan Univ. 2017, 42, 1549–1564. [Google Scholar]
Zhang, X.; Liu, T.; Liu, C.; Chen, Z. Research on skid resistance of asphalt pavement based on three-dimensional laser-scanning technology and pressure-sensitive film. Constr. Build. Mater. 2014, 69, 49–59. [Google Scholar] [CrossRef]
Dan, H.C.; Lu, B.; Li, M. Evaluation of asphalt pavement texture using multi view stereo reconstruction based on deep learning. Constr. Build. Mater. 2024, 412, 134837. [Google Scholar] [CrossRef]
Qi, C.; Liu, W.; Wu, C.; Su, H.; Guibas, L. Frustum pointnets for 3d object detection from rgb-d data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 918–927. [Google Scholar]
Shen, H.; Chai, Y. Summary of Binocular Vision in Computer Vision. Sci. Technol. Inf. 2007, 150–151. [Google Scholar] [CrossRef]
Schwarz, M.; Schulz, H.; Behnke, S. RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 1329–1335. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Peng, Y.; Zhang, L.; Zhang, Y.; Liu, S.; Guo, M. Deep deconvolution neural network for image super-resolution. J. Softw. 2017, 29, 926–934. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Grigorev, A.; Jiang, F.; Rho, S.; Sori, W.; Liu, S.; Sai, S. Depth estimation from single monocular images using deep hybrid network. Multimed. Tools Appl. 2017, 76, 18585–18604. [Google Scholar] [CrossRef]
Laina, I.; Rupprecht, C.; Belagiannis, V.; Tombari, F.; Navab, N. Deeper depth prediction with fully convolutional residual networks. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016; pp. 239–248. [Google Scholar]
Chen, D. Evaluating asphalt pavement surface texture using 3D digital imaging. Int. J. Pavement Eng. 2020, 21, 416–427. [Google Scholar] [CrossRef]
Acharya, P.K.; Henderson, T.C. Parameter estimation and error analysis of range data. In Proceedings of the 1988 IEEE International Conference on Robotics and Automation, Philadelphia, PA, USA, 24–29 April 1988; pp. 1709–1714. [Google Scholar]
Guan, Y.; Cheng, X.; Shi, G. A robust method for fitting a plane to point clouds. J. Tongji Univ. (Nat. Sci.) 2008, 36, 981–984. [Google Scholar]
Min, C.; Chen, S. Conditional extremums of functions of multi-variables. Stud. Coll. Math. 2021, 24, 72–75. [Google Scholar]
Ma, F.; Karaman, S. Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 4796–4803. [Google Scholar]
Zwald, L.; Lambert, L.S. The BerHu penalty and the grouped effect. arXiv 2012, arXiv:1207.6868. [Google Scholar]
Xu, W.; Zou, L.; Wu, L.; Fu, Z. Self-Supervised monocular depth learning in low-texture areas. Remote Sens. 2021, 13, 1673. [Google Scholar] [CrossRef]
Chen, J.; Huang, X.; Zheng, B.; Zhao, R.; Liu, X.; Cao, Q.; Zhu, S. Real-time identification system of asphalt pavement texture based on the close-range photogrammetry. Constr. Build. Mater. 2019, 226, 910–919. [Google Scholar] [CrossRef]
Miao, Y.; Wang, L.; Wang, X.; Gong, X. Characterizing asphalt pavement 3-D macrotexture using features of co-occurrence matrix. Int. J. Pavement Res. Technol. 2015, 8, 243. [Google Scholar]
Liu, F.; Shen, C.; Lin, G. Deep convolutional neural fields for depth estimation from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5162–5170. [Google Scholar]
Wang, P.; Shen, X.; Lin, Z.; Cohen, S.; Price, B.; Yuille, A.L. Towards unified depth and semantic prediction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2800–2809. [Google Scholar]
Hao, Z.; Li, Y.; You, S.; Lu, F. Detail preserving depth estimation from a single image using attention guided networks. In Proceedings of the 2018 International Conference on 3D Vision (3DV), Verona, Italy, 5–8 September 2018; pp. 304–313. [Google Scholar]
Dong, S.; Han, S.; Wu, C.; Xu, O.; Kong, H. Asphalt pavement macrotexture reconstruction from monocular image based on deep convolutional neural network. Comput.-Aided Civ. Infrastruct. Eng. 2022, 37, 1754–1768. [Google Scholar] [CrossRef]
Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A Comprehensive Survey on Transfer Learning. Proc. IEEE 2022, 109, 43–76. [Google Scholar] [CrossRef]
JTG 3450-2019; Field Test Methods of Subgrade and Pavement for Highway Engineering. Ministry of Transport of the People’s Republic of China: Beijing, China, 2019.

Figure 1. Flowchart.

Figure 2. Establishment of macroscopic texture data set of asphalt pavement: (a) Acquisition of asphalt pavement texture image; (b) Uncut 2D images; (c) 2D image after cutting; (d) Scanning point cloud depth image.

Figure 3. Depth map before noise reduction vs. depth map after noise reduction: (a) Outliers in point cloud images; (b) Point cloud depth map after noise reduction.

Figure 4. Level correction of texture data: (a) Tilted texture point cloud depth map; (b) The corrected texture point cloud depth map.

Figure 5. Image enhancement.

Figure 6. Network model.

Figure 7. Residual structure, batch normalization and activation function position: (a) The network structure of the residual unit in ResNet; (b) Batch normalization module and activation function location.

Figure 8. Loss curve with number of iterations.

Figure 9. Sanding method.

Figure 10. Training process under different training strategies: (a) Transfer learning is not used; (b) Using transfer learning.

Figure 11. MAE, RMSE and REL results of different types of mixtures: (a) AC mixture; (b) SMA mixture; (c) OGFC mixture; (d) PA mixture.

Figure 12. Example of 3D reconstruction results for pavement macro-texture.

Figure 13. Evaluation of the effectiveness of reconstructed pavement macro-texture: (a) Correlation of MTD; (b) Correlation of f₈.

Figure 14. Comparison of the average construction depth results for the two methods.

Figure 15. 3D Reconstruction of Macrotexture for Dense-Graded Asphalt Pavement.

Figure 16. 3D Reconstruction of Macrotexture for Semi-Open-Graded Asphalt Pavement.

Figure 17. 3D Reconstruction of Macrotexture for Open-Graded Asphalt Pavement.

Figure 18. Pendulum friction instrument on-site detection.

Figure 19. Linear relationship between the average texture depth of the 3D reconstruction model of the road surface and the value of FB₂₀.

Figure 20. The correlation between the structural depth of the 3D model of the road surface and the pendulum F_B₂₀.

Table 1. Parameters of laser pavement texture analyzer.

Parameter	Scan Line	The Required Scanning Width	Actual Scan Width	Scanning Width Resolution	Scanning Width Spacing	Scan Length	Scanning Length Spacing
Value	70 articles	72.009 mm	71.5645 mm	0.0246 mm	1.04 mm	101.6 mm	0.0356 mm

Table 2. The size of the output characteristics and the input/output channels of each layer.

Number	Block Name	Input/C	Output/C	Output/Size
	Conv1	3	64	104 × 72
×3	Residual Block 1	64	256	52 × 36
×4	Residual Block 2	256	512	26 × 18
×23	Residual Block 3	512	1024	13 × 9
×3	Residual Block 4	1024	2048	13 × 9
	Conv2	2048	1024	13 × 9
	Up1	1024	512	26 × 18
	Up2	512	256	52 × 36
	Up3	256	128	104 × 72
	Up4	128	64	208 × 144
	Up5	256	16	208 × 144
	Up6	512	16	208 × 144
	Up7	1024	16	208 × 144
	Up8	2048	16	208 × 144
	Conv3	64	64	208 × 144
	Conv4	3	32	208 × 144
	Conv5	128	128	208 × 144
	Conv6	128	128	208 × 144
	Conv7	128	1	208 × 144

Table 3. Experimental results compared with other models on different data sets.

Model	RMSE		REL		Accuracies
	V2	RGB	V2	RGB	δ = 1		δ = 2		δ = 3
	V2	RGB	V2	RGB	V2	RGB	V2	RGB	V2	RGB
Liu [25]	0.824	1.012	0.230	0.421	0.614	0.492	0.883	0.832	0.971	0.828
Wang [26]	0.745	0.623	0.220	0.465	0.605	0.548	0.890	0.810	0.970	0.914
Hao [27]	0.555	0.715	0.127	0.221	0.841	0.726	0.966	0.892	0.991	0.957
Dong [28]	0.592	0.668	0.139	0.275	0.826	0.879	0.946	0.927	0.987	0.933
Ours	0.630	0.491	0.135	0.102	0.801	0.931	0.951	0.979	0.986	0.990

Table 4. Comparison of reconstructed texture results for dense asphalt pavement.

Measuring Point Number	MTD	MTD′	REL (%)
1	1.45	1.38	5.07%
2	1.38	1.44	4.17%
3	1.19	1.23	3.25%
4	1.30	1.35	3.70%
5	1.46	1.39	5.04%
6	1.33	1.26	5.56%
7	1.12	1.18	5.08%
8	1.36	1.32	3.03%
9	1.44	1.50	4.00%
10	1.33	1.41	5.67%
11	1.12	1.16	3.45%
12	1.49	1.53	2.61%
13	1.30	1.33	2.25%

Table 5. Comparison of texture reconstruction results for semi-open graded asphalt pavement.

Measuring Point Number	MTD	MTD′	REL (%)
1	1.74	1.80	3.33%
2	1.92	1.96	2.04%
3	1.83	1.89	3.17%
4	1.78	1.73	2.89%
5	1.9	1.93	1.55%
6	1.72	1.67	2.99%
7	1.74	1.69	2.96%
8	2.00	2.05	2.44%
9	1.95	1.90	2.63%
10	1.91	1.96	2.55%
11	1.68	1.75	4.00%
12	1.75	1.68	4.17%
13	1.57	1.66	5.42%
14	1.66	1.63	1.84%
15	1.83	1.79	2.23%

Table 6. Comparison of reconstructed texture results for open-graded asphalt pavement.

Measuring Point Number	MTD	MTD′	REL (%)
1	2.04	1.94	5.15%
2	2.14	2.24	−4.46%
3	2.36	2.27	3.96%
4	2.40	2.46	−2.44%
5	2.14	2.21	−3.17%
6	2.57	2.48	3.63%
7	2.51	2.59	−3.09%
8	2.18	2.23	−2.24%
9	2.08	2.01	3.48%
10	2.13	2.04	4.41%
11	2.35	2.27	3.52%
12	2.08	2.01	3.48%
13	2.60	2.52	3.17%
14	2.41	2.50	−3.60%
15	2.03	1.95	4.10%
16	2.07	2.01	2.99%
17	2.17	2.25	−3.56%
18	2.47	2.53	−2.37%
19	2.12	2.18	−2.75%
20	2.15	2.22	−3.15%
21	2.31	2.37	−2.53%
22	2.41	2.46	−2.03%

Table 7. Temperature correction value.

Observed Temperature T (°C)	Temperature Correction ΔF (°C)	Observed Temperature (°C)	Temperature Correction (°C)
0	−6	25	+2
5	−4	30	+3
10	−3	35	+4
15	−1	40	+7
20	0

Table 8. Test results for pendulum friction tester for four road sections.

Number	F_B₁₀	F_B₂₀	Number	F_B₁₀	F_B₂₀
1	65.37	62.37	26	65.55	62.55
2	66.56	63.56	27	66.13	63.13
3	65.12	62.12	28	67.92	64.92
4	63.98	60.98	29	67.44	64.44
5	67.46	64.46	30	65.61	62.61
6	67.34	64.34	31	66.5	63.50
7	64.97	61.97	32	66.52	63.52
8	66.13	63.13	33	67.36	64.36
9	65.58	62.58	34	63.39	60.39
10	62.25	59.25	35	65.58	62.58
11	67.48	64.48	36	64.83	61.83
12	65.12	62.12	37	63.98	60.98
13	66.64	63.64	38	62.49	59.49
14	65.61	62.61	39	66.58	63.58
15	64.48	61.48	40	64.4	61.40
16	67.46	64.46	41	62.07	59.07
17	65.55	62.55	42	62.32	59.32
18	67.56	64.56	43	67.24	64.24
19	62.21	59.21	44	63.78	60.78
20	64.8	61.80	45	64.62	61.62
21	66.39	63.39	46	63.39	60.39
22	66.32	63.32	47	67.39	64.39
23	66.64	63.64	48	64.48	61.48
24	67.48	64.48	49	67.48	64.48
25	66.45	63.45	50	62.32	59.32

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, X.; Yin, C. 3D Reconstruction of Asphalt Pavement Macro-Texture Based on Convolutional Neural Network and Monocular Image Depth Estimation. Appl. Sci. 2025, 15, 4684. https://doi.org/10.3390/app15094684

AMA Style

Liu X, Yin C. 3D Reconstruction of Asphalt Pavement Macro-Texture Based on Convolutional Neural Network and Monocular Image Depth Estimation. Applied Sciences. 2025; 15(9):4684. https://doi.org/10.3390/app15094684

Chicago/Turabian Style

Liu, Xinliang, and Chao Yin. 2025. "3D Reconstruction of Asphalt Pavement Macro-Texture Based on Convolutional Neural Network and Monocular Image Depth Estimation" Applied Sciences 15, no. 9: 4684. https://doi.org/10.3390/app15094684

APA Style

Liu, X., & Yin, C. (2025). 3D Reconstruction of Asphalt Pavement Macro-Texture Based on Convolutional Neural Network and Monocular Image Depth Estimation. Applied Sciences, 15(9), 4684. https://doi.org/10.3390/app15094684

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

3D Reconstruction of Asphalt Pavement Macro-Texture Based on Convolutional Neural Network and Monocular Image Depth Estimation

Abstract

1. Introduction

2. Methods

2.1. Data Set

2.1.1. Acquisition of Asphalt Macro-Texture Data Set

2.1.2. Point Cloud Data Preprocessing

2.1.3. Image Augmentation

2.2. Introduction of Network Architecture

2.2.1. Introduction of CNN and CNN-Based Depth Estimation

2.2.2. Encoder–Decoder Network Architecture

2.3. Training Strategy

2.3.1. Loss Function

2.3.2. Transfer Learning

2.4. Reconstruction of 3D Pavement Macro Texture Evaluation

2.4.1. Accuracy Evaluation Index

2.4.2. Effectiveness Evaluation

3. Results and Discussion

3.1. Performance on Different Data Sets

3.2. Comparison of Different Training Results

3.3. Evaluation of Macroscopic Texture Accuracy of Asphalt After Reconstruction

3.4. Pavement Performance Evaluation

4. Case Study

4.1. Field Test Comparison Test of Asphalt Pavement

4.2. Field Testing of Skid Resistance

4.3. Analysis of Relationship Between MTD and Skid Resistance of Asphalt Pavement

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI