Adaptive Difference Least Squares Support Vector Regression for Urban Road Collapse Timing Prediction

Han, Yafang; Quan, Limin; Liu, Yanchun; Zhang, Yong; Li, Minghou; Shan, Jian

doi:10.3390/sym16080977

Open AccessArticle

Adaptive Difference Least Squares Support Vector Regression for Urban Road Collapse Timing Prediction

by

Yafang Han

^1,2,3,

Limin Quan

^1,2,4,*,

Yanchun Liu

^3,*,

Yong Zhang

⁴,

Minghou Li

³ and

Jian Shan

³

¹

Key Laboratory of Geological Safety of Coastal Urban Underground Space, Ministry of Natural Resources, Qingdao 266061, China

²

Qingdao Key Laboratory of Groundwater Resources Protection and Rehabilitation, Qingdao 266061, China

³

Qingdao Geo-Engineering Surveying Institute (Qingdao Geological Exploration Development Bureau), Qingdao 266061, China

⁴

School of Information and Control Engineering, Qingdao University of Technology, Qingdao 266520, China

^*

Authors to whom correspondence should be addressed.

Symmetry 2024, 16(8), 977; https://doi.org/10.3390/sym16080977

Submission received: 27 June 2024 / Revised: 21 July 2024 / Accepted: 26 July 2024 / Published: 1 August 2024

(This article belongs to the Special Issue Applications Based on Symmetry/Asymmetry in Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

:

The accurate prediction of urban road collapses is of paramount importance for public safety and infrastructure management. However, the complex and variable nature of road subsidence mechanisms, coupled with the inherent noise and non-stationarity in the data, poses significant challenges to the development of precise and real-time prediction models. To address these challenges, this paper develops an Adaptive Difference Least Squares Support Vector Regression (AD-LSSVR) model. The AD-LSSVR model employs a difference transformation to process the input and output data, effectively reducing noise and enhancing model stability. This transformation extracts trends and features from the data, leveraging the symmetrical characteristics inherent within it. Additionally, the model parameters were optimized using grid search and cross-validation techniques, which systematically explore the parameter space and evaluate model performance of multiple subsets of data, ensuring both precision and generalizability of the selected parameters. Moreover, a sliding window method was employed to address data sparsity and anomalies, ensuring the robustness and adaptability of the model. The experimental results demonstrate the superior adaptability and precision of the AD-LSSVR model in predicting road collapse timing, highlighting its effectiveness in handling the complex nonlinear data.

Keywords:

symmetry; difference transformation; adaptive LSSVR; sliding window; urban road collapse timing prediction

1. Introduction

With the acceleration of urbanization, the intricacy of underground pipeline networks has witnessed a marked escalation. Pipeline failures can precipitate alterations in the subsurface soil composition, potentially leading to road failures and subsequent collapses [1,2]. Figure 1 illustrates the impact of such collapses on urban infrastructure, drawing attention to the disruption they cause to transportation networks and the daily lives of city residents. To tackle the above issues, a prompt and precise early-warning model is essential for forecasting the road collapse timing. However, the elusive nature, inherent unpredictability, and multifaceted causative agents associated with underground pipeline failures present significant obstacles to their accurate prediction [3].

In the ongoing quest to understand the complexities underlying road collapses and to advance detection methodologies, researchers have embarked on a wide array of studies [4,5,6]. This has led to the adoption of diverse technologies, such as ground-penetrating radar for the detection of potential underground voids and high-resolution synthetic aperture radar data for the quantification of urban subsidence [7,8]. Furthermore, the introduction of a real-time Global Navigation Satellite System has marked a significant advancement in the precise monitoring of ground subsidence [9]. Despite these advancements, the risk assessment method associated with this system is limited by several factors, including the need for continuous data input and interference from surface coverage.

In response to the above challenges, researchers have adopted a multifaceted approach that integrates numerical simulation, theoretical analysis, and laboratory experimentation to forecast the occurrence of road collapses [10]. For example, Nader et al. [11] employed Particle Image Velocimetry technology to monitor ground deformation and developed an empirical formula to delineate the complex functional relationship between maximum ground subsidence and the relative density of the ground. Galve et al. [12] conducted comprehensive comparative analyses of various collapse models, indicating that prediction models based on nearest-neighbor distance and sinkhole density exhibit enhanced reliability. Nonetheless, empirical formulas and modeling methodologies, as empirical approaches, may encounter accuracy limitations due to their specific applicability and potential lack of universality across diverse geological settings.

Data-driven soft sensing methods have emerged as a promising solution for predicting road collapse timings, offering an alternative to traditional techniques by utilizing easily measurable variables to predict elusive parameters [13,14,15,16,17,18,19]. Among these methods, Support Vector Machine Regression (SVR) and its variants are particularly notable for their ability to handle complex nonlinear relationships and high-dimensional data, even with small sample sizes, thereby ensuring robustness, generalizability, and accuracy in predictions [20,21,22]. Illustratively, Liu et al. [23] integrated Particle Swarm Optimization (PSO) with Least Squares Support Vector Regression (LSSVR) to refine the impeller design in a Liquefied Natural Gas cryogenic submersible pump. Similarly, Zhu et al. [24] employed kernel principal component analysis combined with PSO to develop a multi-output LSSVR for monitoring displacement in super-high arch dams. These methods, though effective, are computationally burdensome with large datasets and require optimization. Their sensitivity to noise and the challenge of interpreting results are notable constraints [25]. Addressing these challenges is key to refining data-driven soft sensing for more accurate road collapse risk assessment and management.

Drawing upon the analysis above, this paper proposes an Adaptive Difference LSSVR (AD-LSSVR) model for predicting the road collapse timing caused by underground pipeline leaks. The primary contributions of AD-LSSVR are as follows:

(1): A difference transformation method is integrated into the LSSVR model, which mitigates data drift and bolsters model stability by leveraging the symmetrical data characteristics.
(2): A systematic grid search and cross-validation strategy is developed to optimize the model parameters, enhancing its adaptability and accuracy by comprehensively evaluating the model’s performance across various data subsets.
(3): A sliding window technique is applied to address challenges associated with limited real-world data and potential data anomalies, thereby bolstering data integrity and ensuring the robustness of the modeling process.
(4): The experimental results demonstrate the superiority of AD-LSSVR in terms of timeliness and accuracy compared to conventional methods. Furthermore, the proposed method enables urban infrastructure managers to predict potential road collapses before they occur, facilitating timely interventions that may prevent potential damage and disruption.

The subsequent sections of the paper are structured as follows: Section 2 delineates the foundational principles of SVR; Section 3 details the construction of the AD-LSSVR model; Section 4 presents the experimental results and offers a comprehensive analysis; and Section 5 concludes the study, summarizing the key findings and highlighting the implications of the AD-LSSVR model for urban infrastructure risk management.

2. Preliminaries

In this section, the regression problem and basic concepts of SVR are briefly reviewed.

2.1. Regression Problem

For predicting the timing of road collapses, we modelled the task as a nonlinear regression problem. The regression function is expressed as follows:

y = w^{T} ϕ (x) + b

(1)

where y is the regression function that predicts the time to collapse for a given input feature vector x, and w is the weight vector in the feature space, which determines the influence of each input feature on the predicted outcome. ϕ(x) is a feature mapping function that maps the original input. b is the bias term, a constant that adjusts the model’s output to better fit the data.

The goal of the regression problem is to find the optimal values for w and b such that the prediction error between the actual and predicted collapse times is minimized. This is typically achieved through an optimization process that considers both the training data and the smoothness of the regression function, as defined by the regularization parameter C in the SVR framework.

2.2. SVR

SVR is a regression technique that builds upon the principles of Support Vector Machines [26]. SVR seeks to maximize the margin between classes, focusing on support vectors and margin maximization for improved generalization and efficiency. Moreover, SVR incorporates an ε-insensitivity zone to handle continuous predictions robustly, ignoring minor fluctuations [24]. To tackle nonlinear regression tasks, SVR employs kernel functions that transform the input data into a higher-dimensional space, enabling the resolution of complex, nonlinear relationships that may be present in the original data [27].

The optimization objectives and constraints of SVR are defined as follows:

\min_{w, b, ξ_{i}, {ξ_{i}}^{*}} J (w, b, ξ_{i}, {ξ_{i}}^{*}) = \frac{1}{2} {‖w‖}^{2} + C \sum_{i = 1}^{N} (ξ_{i} + ξ_{i}^{*})

(2)

subject to \{\begin{matrix} y_{i} - w^{T} ϕ (x) - b \leq ε + ξ_{i} \\ w^{T} ϕ (x) + b - y_{i} \leq ε + ξ_{i}^{*} \\ ξ_{i} \geq 0, ξ_{i}^{*} \geq 0, i = 1, 2, \dots, N . \end{matrix}

(3)

where y_i is the desired output,

ξ_{i}

and

{ξ_{i}}^{*}

are the non-negative slack variables representing the error size for each sample, N represents the number of samples, ω is the weight vector representing the normal vector of the decision boundary, b is the bias term representing the offset of the decision boundary, and C is the regularization parameter used to balance the complexity of the model and its performance on the training data. A larger value of C means a greater penalty for errors. ε is the parameter for the epsilon-insensitive loss function. The constraints in SVR are designed to ensure that the prediction error for each training example does not exceed ε, and that the width of the margin is maximized.

3. Adaptive Difference LSSVR Model

In this section, we detail the design of the AD-LSSVR model for predicting road collapse timing. LSSVR is chosen over traditional SVR due to its superior performance in time-to-event predictions, which is facilitated by its objective function that minimizes the squared error between predicted and actual collapse timing [28]. This focus on temporal accuracy, along with LSSVR’s capacity to handle high-dimensional data and small sample sizes, makes it a more suitable choice for predicting the precise timing of road collapse incidents. These attributes enable LSSVR to effectively capture the temporal dynamics of road collapse events, providing a robust and accurate predictive tool for urban infrastructure management.

3.1. Construction of AD-LSSVR

To address the challenges posed by noise and irrelevant fluctuations in the data, the AD-LSSVR model is designed to leverage the differential transformation of both input and output data. This approach serves to refine the dataset, thereby enhancing the model’s capacity to accurately discern and capture the underlying patterns and trends. This refinement is essential for achieving reliable and precise predictions in our specific application domain, such as urban road collapse prediction.

For (x, y), x = [x₁, x₂, …, x₆] represents various physical characteristics and parameters that are believed to influence the timing of road collapses:

x₁_(Dr): relative density of soil,

x₂_(cm/s): flow velocity of underground water pipelines;

x₃_(mm): maximum horizontal size of the cavity in the plan view;

x₄_(mm): maximum vertical size of the cavity in the plan view;

x₅_(mm): maximum horizontal size of the cavity in the frontal view;

x₆_(mm): maximum vertical size of the cavity in the frontal view.

These input variables are quantifiable measures that are expected to correlate with the output variable y_(min), which is the timing of road collapses.

The AD-LSSVR model is constructed to optimize the following objective function, as outlined in [29]:

\min_{w, b, ξ i} J (w, b, ξ i) = \frac{1}{2} {‖w‖}^{2} + C \sum_{i = 1}^{N} ξ_{i}^{2}

(4)

subject to the constraints

Δ y_{i} = w^{T} ϕ (Δ x_{i}) + b + ξ_{i}, i = 1, 2, \dots, N

(5)

where

Δ x_{i} = x_{i} - x_{i - 1}

,

Δ y_{i} = y_{i} - y_{i - 1}

, and

Δ {\hat{y}}_{i}

represent the prediction value of

Δ y_{i}

, and

ξ_{i} = Δ y_{i} - Δ {\hat{y}}_{i}

is the modeling error.

In the AD-LSSVR model, the objective of minimizing

{‖w‖}^{2}

is to find a smooth (i.e., not overly complex) model.

ξ i = {(ξ_{1}, ξ_{2}, \dots, ξ_{N})}^{T}

is the error vector, allowing the model to have a certain degree of flexibility under some constraints.

To solve the LSSVR optimization problem, the Lagrangian function is constructed as follows:

L (w, b, ξ i, α) = \frac{1}{2} {‖w‖}^{2} + C \sum_{i = 1}^{N} ξ_{i}^{2} - \sum_{i = 1}^{N} α_{i} (ξ_{i} + w^{T} ϕ (Δ x_{i}) + b - Δ y_{i})

(6)

where

α = {(α_{1}, α_{2}, \dots, α_{N})}^{T}

is the vector of Lagrange multipliers, which is used to ensure that w, b, and ξ satisfy the above constraints.

By taking derivatives concerning w, b, and ξ and setting them to zero, we can derive the solutions for these parameters.

\frac{\partial L}{\partial w} = w - \sum_{i = 1}^{N} α_{i} ϕ (Δ x_{i}) = 0

(7)

\frac{\partial L}{\partial b} = \sum_{i = 1}^{N} α_{i} = 0

(8)

\frac{\partial L}{\partial ξ_{i}} = 2 C ξ_{i} - α_{i} = 0

(9)

Substituting Equations (7)–(9) into Equation (6) and expressing the inner product of vectors as

K (Δ x_{i}, Δ x_{j})

, we obtain

L (α) = \frac{1}{2} \sum_{i, j = 1}^{N} α_{i} α_{j} K (Δ x_{i}, Δ x_{j}) + \frac{1}{4 C} \sum_{i = 1}^{N} α_{i}^{2} - \sum_{i = 1}^{N} α_{i} Δ y_{i} .

(10)

For computational ease, we transform the task of minimizing L(α) in Equation (10) into a maximization problem of its negative, as stated in Equation (11).

\max_{α} J (α) = - \frac{1}{2} \sum_{i, j = 1}^{N} α_{i} α_{j} K (Δ x_{i}, Δ x_{j}) - \frac{1}{4 C} \sum_{i = 1}^{N} α_{i}^{2} + \sum_{i = 1}^{N} α_{i} Δ y_{i}

(11)

where

K (Δ x_{i}, Δ x_{j})

is the kernel function.

The Gaussian kernel function is chosen for its robust nonlinear mapping capabilities, which are essential for capturing complex relationships in the data. By mapping data into a high-dimensional space, the Gaussian kernel enables the model to capture more detailed data features, thus improving prediction accuracy. Additionally, it allows for controlling the model’s smoothness by adjusting the bandwidth parameter, achieving an effective balance between overfitting and underfitting, thus enhancing the model’s generalization ability [30]. Given the strong nonlinearity and noise in the high-dimensional data used in this study, and based on our comparison with other kernel functions, we chose the Gaussian kernel function for data mapping. The Gaussian kernel function is defined as

K (Δ x_{i}, Δ x_{j}) = \exp (- γ ∥ Δ x_{i} - Δ x_{j} ∥^{2})

(12)

where

γ

is a positive constant that controls the width of the kernel, and

{‖Δ x_{i} - Δ x_{j}‖}^{2}

represents the squared Euclidean distance between

Δ x_{i}

and

Δ x_{j}

.

Finally, the prediction function for road collapse timing using the AD-LSSVR model is formulated using the obtained model parameters w and b:

$Δ {\hat{y}}_{i} = \sum_{j = 1}^{N} α_{j} K (Δ x_{i}, Δ x_{j}) + b$

(13)

Remark 1:

For the AD-LSSVR model, the difference transformation is pivotal in extracting trends and features that reflect the data’s symmetrical patterns. This process is crucial for capturing the inherent symmetrical characteristics of road collapse caused by underground pipeline leaks. By harnessing these symmetrical characteristics, the model gains a deeper understanding of the underlying patterns, resulting in improved prediction accuracy and robustness.

Remark 2:

The Lagrangian method utilizes Lagrange multipliers to convert constrained problems into unconstrained ones. By adjusting these multipliers, it identifies a balance point where the objective’s gradient aligns with the constraints, ensuring a conflict-free optimal solution.

3.2. Adaptive Parameters Optimization Based on Grid Search and Cross-Validation

To precisely determine the key parameters of the AD-LSSVR, this paper utilizes the grid search and cross-validation methods for the optimization of C and γ [31]. The adaptive parameter optimization process for the AD-LSSVR model, as depicted in Figure 2, is detailed as follows:

Initialize a comprehensive parameter grid for C and γ, ranging from 0 to 100 and from 0 to 0.5, respectively. This broad search space allows for a thorough exploration of parameter combinations.
Implement k-fold cross-validation to systematically evaluate the model performance of each parameter combination. The data should be evenly partitioned into k subsets, with each subset serving as the validation set in turn, while the remaining subsets are used for training.
Conduct a grid search to train the AD-LSSVR model across all combinations of C and γ, and assess the model performance on the validation set. This step identifies the most effective parameter combination.
Select the optimal parameter combination based on the cross-validation results, ensuring that the model not only performs well on the training data but also generalizes effectively to unseen data.
Train the final model using these optimal parameters to enhance the model predictive accuracy on the entire dataset, thereby improving its performance on test data.

By adhering to these algorithmic steps that intertwine grid search and cross-validation, the model parameters are optimized, resulting in a robust and fine-tuned model for accurate road collapse timing prediction.

Remark 3:

The symmetric grid search process treats each combination of C and γ equally, while the cross-validation technique evaluates the model’s performance on multiple subsets of data, reflecting the symmetrical nature of the data and enhancing the model’s adaptability and accuracy.

3.3. Sliding Window Method for Data Processing

The sliding window technique was employed to address the challenges of limited and scarce data available for road collapse prediction due to pipeline leaks. This method is particularly effective for time series data, as it generates overlapping time windows to augment the number of data samples. The algorithm flow can be described as follows:

(1) Initialization: Load the entire dataset X, where each element x_t represents a data point at time t. Define the window size W and the step size S for the sliding window.

(2) Window Creation: Initialize the window at the first data point of the dataset. Define the first sample s₁ as the data points within the initial window.

(3) Sliding Window Processing: For each subsequent data point x_t, extract the data points x_t_-W+1, x_t-W₊₂, …, x_t within the current window to form sample s_t. Then, shift the window to the right by S data points. Create a new sample s_t₊₁ using the data points in the new window position.

(4) Output and Iteration: Output the processed samples {s₁, s₂, …, s_N}, where N is the total number of samples. Repeat steps (3) and (4) for each subsequent window until the end of the dataset is reached.

The number of overlapping samples created by the sliding window method is (W − S + 1) × (1 + (W − 1)/S). If S = 1, this is equivalent to creating (W − S + 1) samples.

This technique effectively increases the dataset size, allowing the model to learn and generalize the patterns within the time series more effectively.

4. Experimental Study

To evaluate the effectiveness of the AD-LSSVR model, we conducted a comparative analysis with five alternative methods: the basic LSSVR, Long Short-Term Memory (LSTM) [32], Random Forest (RF) [33], Extreme Learning Machine (ELM) [34], and Random Walk (RW) [35] are involved. The performance metrics of these methods were examined through two benchmark experiments and an assessment of their precision of the road collapse timing.

All experiments were carried out on MATLAB 2023b and the same PC with Intel^® CORE(TM) i5-13500 ^@2.5 GHz CPU and 16 GB RAM. The models’ performance was evaluated using the Root Mean Square Error (RMSE) and the Coefficient of Determination R².

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}

(14)

1 - R^{2} = \frac{{RMSE}^{2}}{\frac{1}{N} \sum_{N}^{i = 1} {(y_{i} - \bar{y})}^{2}} .

(15)

where

\bar{y}

is the average value of the model output. RMSE quantifies the average size of the prediction error, providing a measure of how well the model fits the sample data. Meanwhile, R² indicates the proportion of the dependent variable variance that is predictable from the model, representing the model’s explanatory power.

4.1. Mackey–Glass Time Series Prediction

To verify the model’s time series prediction capability, the following Mackey–Glass time series model was employed [36].

\frac{d y (t)}{d t} = \frac{0.2 y (t - τ)}{1 + y {(t - τ)}^{10}} - 0.1 y (t)

(16)

The production term coefficient was set to 0.2, which is sufficient to introduce system growth without being too rapid, ensuring the system’s complex dynamics; the decay coefficient was set to 0.1 to maintain the system’s dynamic activity, preventing it from quickly stabilizing or decaying; the nonlinearity index was set to 10, which effectively demonstrates chaotic behavior without being overly complex. The delay parameter τ was chosen because when τ exceeds about 16.8, the system begins to show nonlinearity, so τ was chosen as 21. Due to the delay and nonlinearity characteristics of the Mackey–Glass equation, initial transient behavior may be exhibited. Therefore, after selecting t = 21, 200 sets of data were used as the training set and 50 sets as the testing set to more aptly simulate the dataset in the experiments described below. Finally, Table 1 was obtained by averaging the results of thirty experiments, and Figure 3 shows the best performance among these thirty experiments.

Figure 3 graphically compares the R² predictions of six predictive methods using the Mackey–Glass time series data: AD-LSSVR, LSSVR, LSTM, RF, ELM, and RW. The blue circles represent the predicted values of the Mackey–Glass time series, and the red line represents the actual values of the Mackey–Glass time series. The AD-LSSVR model’s prediction curve aligns most closely with the actual data points, presenting a smoother and more stable pattern than the other algorithms. This suggests that the AD-LSSVR model offers superior stability and a closer match to the actual data peaks and troughs. Conversely, the RW model displays the poorest performance, likely due to the limited amount of data available for training, which hinders its ability to accurately capture the complex temporal dynamics of road collapse events.

Table 1 provides a detailed performance comparison of different algorithms, including RMSE, R², and average testing time. It shows that LSTM achieves the lowest RMSE value of 0.89 and the highest R² value of 0.9967, indicating excellent predictive performance. AD-LSSVR also performs well, with an RMSE of 0.93 and an R² of 0.9965, demonstrating strong accuracy and efficiency with an average testing time of 0.015 s. LSSVR exhibits solid performance with an RMSE of 1.31 and an R² of 0.9832, and its testing time is relatively short at 0.014 s. In contrast, RF, ELM, and RW display higher RMSE values and varying levels of predictive accuracy, with testing times around 0.012 s. RF, ELM, and RW’s performance is notably lower, with RMSE values of 3.42, 3.76, and 6.32, respectively, and lower R² values, indicating less accuracy compared to AD-LSSVR and LSSVR. These findings collectively suggest that while LSTM offers the best predictive performance in terms of accuracy, AD-LSSVR is a highly efficient method that balances accuracy with speed, making it a suitable method for predicting the timing of road collapses.

4.2. Nonlinear Dynamic System Identification

The nonlinear dynamic system, as outlined in Equation (17), is a commonly employed benchmark for evaluating the nonlinear modeling capabilities of various algorithms [19].

y (t + 1) = (\begin{matrix} \frac{y (t) y (t - 1) [y (t) + 2.5]}{1 + y^{2} (t) + y^{2} (t - 1)} + u (t) \end{matrix})

(17)

where y(0) = 0, y(1) = 0, and u(t) = sin(2πt/25). The system inputs are {y(t), y(t − 1), and u(t)}, and the output is y(t + 1). In this experiment, data from epochs t = 1 to t = 140 were selected as training samples, while data from epochs t = 141 to t = 200 were used as testing samples.

Figure 4 graphically contrasts the predictive accuracy of the various models tested in this experiment. AD-LSSVR and LSTM are observed to exhibit the most precise fit to the actual data, showcasing their robust predictive prowess. LSSVR, RF, and ELM also perform well, albeit with a slightly reduced level of accuracy when compared to AD-LSSVR and LSTM. In contrast, RW demonstrates the poorest predictive performance, with its predictions deviating most noticeably from the ideal line, indicating its least effective handling of nonlinear dynamic data. Thus, AD-LSSVR and LSTM emerge as the top performers, while RW falls short in terms of predictive capability.

Table 2 presents a comprehensive comparison of the performance metrics for various algorithms, including RMSE, R², and average testing time. The AD-LSSVR model achieved the lowest RMSE value of 0.0081 and the highest R² value of 0.9971, with a fast average testing time of 0.011 s, indicating its strong predictive ability. LSTM also performed well, but with a longer testing time of 4.137 s. The performance of LSSVR, ELM, and RF was slightly inferior to that of LSTM, with RMSE values of 0.0207, 0.0153, and 0.0232 and R² values of 0.9902, 0.9893, and 0.9873, respectively, within an average testing time of approximately 0.01 s. RW, however, exhibited poor performance in both RMSE and R², indicating its limited effectiveness in predicting nonlinear dynamic systems. The results indicate that the AD-LSSVR model is highly effective, precise, and fast in identifying the nonlinear dynamic system.

4.3. Road Collapse Timing Prediction

In response to the growing issue of road collapses triggered by seepage from underground water pipelines, this section details an experimental investigation into predicting road collapse timing using the AD-LSSVR model. Figure 5 shows the entire process of predicting road collapse timing using the AD-LSSVR model. According to Figure 4, the experimental preparation is as follows:

(1) Feature Extraction: We collected ninety sets of data on factors related to road collapse under different operating conditions and grouped the data with the same relative compaction and flow rate together. As detailed in [37]: the relative density of soil—x₁; the flow velocity of underground water pipelines—x₂; the maximum horizontal and vertical size of the cavity in the plan view—x₃, x₄; the maximum horizontal and vertical size of the cavity in the frontal view—x₅, x₆; and collapse timing—y. Meanwhile, as shown in Table 3, we have listed some road collapse events.

(2) Data Partitioning and Normalization: The dataset, derived from laboratory monitoring of nine road collapse events, comprised 90 data groups. These groups were indexed chronologically to maintain temporal data correlations. The sliding window method was then applied to segment this dataset, resulting in the collection of 300 training samples and 30 testing samples. These samples were used to evaluate the methods. Additionally, the data was normalized to the [0, 1] range for enhanced accuracy.

(3) Model Training Setup: the AD-LSSVR model was trained using the training set data, as outlined in Section 3.

(4) Parameters Optimization: The optimal values of C and γ were identified through grid search and cross-validation, resulting in C = 12.92 and γ = 0.215. (The parameter configurations for the comparative methods are as follows: LSSVR employs a regularization parameter of 9.72 and a kernel parameter of 0.126. LSTM consists of 100 neurons, two layers, and undergoes 100 iterations. RF utilizes 200 trees with a minimum of two samples per leaf node. ELM incorporates 500 neurons in the hidden layer. RW involves a walk length of 20 and a restart probability of 0.3).

Figure 6 offers a graphical illustration of the prediction outcomes for six distinct methods: AD-LSSVR, LSSVR, LSTM, RF, ELM, and RW. The blue circular markers indicate the actual collapse timings, while the red star markers denote the predicted timings. The AD-LSSVR model’s prediction curve aligns most closely with the actual data points, presenting a smoother and more stable pattern than the other algorithms. This suggests that the AD-LSSVR model offers superior stability and a closer match to the actual data peaks and troughs. Conversely, the RW model displays the poorest performance, likely due to the limited amount of data available for training, which hinders its ability to accurately capture the complex temporal dynamics of the road collapse events.

Figure 7 visualizes the prediction errors for different methods, with AD-LSSVR displaying the narrowest maximum error range of −10 to 5. AD-LSSVR’s exceptional performance can be attributed to its effective data differentiation, enhanced adaptability, and robust handling of data anomalies. LSTM’s comparatively narrow error range reflects its low bias and consistent error distribution, indicative of its robust regression performance and high data fitting quality. In contrast, LSSVR, RF, ELM, and RW display broader error ranges and higher error metrics, indicating potential challenges in capturing temporal dynamics and handling complex data, which likely contribute to their lower predictive accuracy.

Table 4 provides a detailed performance comparison of the different algorithms, including RMSE, R², and average testing time. It is shown that AD-LSSVR achieves the lowest RMSE value of 1.94 and the highest R² value of 0.9891, indicating excellent predictive performance. Moreover, AD-LSSVR demonstrates exceptional efficiency with an average testing time of only 0.016 s. LSTM exhibits strong performance as well, boasting an RMSE of 3.32 and an R² of 0.9771, but its testing time is the longest at 6.3 s. In contrast, LSSVR, RF, ELM, and RW exhibit varying levels of performance, with LSSVR achieving the lowest RMSE, while RF, RW, and ELM display higher RMSE values compared to AD-LSSVR. These findings collectively suggest that AD-LSSVR is the most suitable method for predicting the timing of road collapses, balancing accuracy with efficiency.

5. Conclusions

The accurate prediction of urban road collapses is essential for public safety and infrastructure management. This paper proposes the AD-LSSVR model, designed to tackle the complexities and variabilities of road subsidence mechanisms, data noise, and non-stationarity. The model’s symmetric design, with difference transformation for data processing and grid search/cross-validation for parameter optimization, ensures balanced and robust performance. The integration of the sliding window technique further enhances data integrity and model robustness. Experimental results confirm the AD-LSSVR’s superior timeliness and accuracy, particularly in handling complex nonlinear data, highlighting its potential for practical application in urban infrastructure management.

This study provides a foundation for advancing predictive capabilities in urban infrastructure planning and maintenance, contributing to safer and more resilient city development and the ongoing dialog on sustainable urbanization. Our model, though promising, requires refinement and wider testing, particularly in its assumption of registering all cavities, a simplification that may not reflect real-world complexities. Future work will explore alternative parameters for more accurate predictions of cavity formation and collapse events.

Author Contributions

Conceptualization, Y.H. and L.Q.; methodology, L.Q. and Y.L.; software, Y.Z., M.L. and J.S.; data curation, Y.H.; writing—original draft preparation, L.Q. and Y.Z.; writing—review and editing, L.Q. and Y.L.; supervision, Y.H.; funding acquisition, Y.H. and L.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Shandong Province under Grant ZR2022ME214 and the Open Fund of the Key Laboratory of Geological Safety of Coastal Urban Underground Space, Ministry of Natural Resources under Grant BHKF2022Z03.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

Authors Yafang Han, Yanchun Liu, Minghou Li and Jian Shan were employed by the company Qingdao Geo-Engineering Surveying Institute. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Smith, J.; Chen, L. The impact of urbanization on underground infrastructure: A systematic review. Urban Infrastruct. J. 2022, 15, 134–150. [Google Scholar]
Lee, A.; Kumar, S. Subsurface soil dynamics and urban road failures: A case study approach. J. Geotech. Eng. 2023, 49, 445–461. [Google Scholar]
Gonzalez, M.; Rodriguez, H. Predicting pipeline failures: The challenge of complexity and unpredictability. Int. J. Pipeline Integr. 2021, 7, 32–47. [Google Scholar]
Patel, R.; Wong, K. Advancements in detection technologies for urban subsurface anomalies. Sens. Actuators A Phys. 2022, 310. [Google Scholar]
Yu, P.; Liu, H.; Wang, Z.; Fu, J.; Zhang, H.; Wang, J.; Yang, Q. Development of urban underground space in coastal cities in China: A review. Deep Undergr. Sci Eng. 2023, 2, 148–172. [Google Scholar] [CrossRef]
Martinez, J.; Garcia, A. Synthetic aperture radar for urban subsidence monitoring: A review. Remote Sens. Appl. Soc. Environ. 2023, 22, 100432. [Google Scholar]
Nguyen, D.; Tran, H. The role of real-time global navigation satellite systems in monitoring ground subsidence. Geomat. Nat. Hazards Risk 2021, 12, 2140–2155. [Google Scholar]
Alonso, J.; Moya, M.; Asensio, L.; de la Morena, G.; Galve, J.P.; Navarro, V. A catenary model for the analysis of arching effect in soils and its application to predicting sinkhole collapse. Géotechnique 2022, 72, 532–542. [Google Scholar] [CrossRef]
Tao, T.; Liu, J.; Qu, X.; Gao, F. Real-time monitoring rapid ground subsidence using GNSS and Vondrak filter. Acta Geophys. 2019, 67, 133–140. [Google Scholar] [CrossRef]
Fang, C.; Hong, L. Particle image velocimetry for combustion measurements: Applications and developments. J. Chin. J. Aeronaut. 2018, 31, 1407–1427. [Google Scholar] [CrossRef]
Moussaei, N.; Khosravi, M.H.; Hossaini, M.F. Physical modeling of tunnel induced displacement in sandy grounds. J. Tunn. Undergr. Space Technol. 2019, 90, 19–27. [Google Scholar]
Galve, J.P.; Gutiérrez, F.; Remondo, J.; Bonachea, J.; Lucha, P.; Cendrero, A. Evaluating and comparing methods of sinkhole susceptibility mapping in the Ebro Valley evaporite karst (NE Spain). Geomorphology 2009, 111, 160–172. [Google Scholar] [CrossRef]
Smith, J.; Doe, A. A comprehensive overview of data-driven soft sensing techniques for infrastructure monitoring. J. Infrastruct. Syst. 2022, 28, 105–119. [Google Scholar]
Quan, L.; Meng, X.; Qiao, J. Robust self-constructing fuzzy neural network-based online estimation for industrial product quality. IEEE Trans. Ind. Inform. 2024, 20, 2213–2222. [Google Scholar] [CrossRef]
Hosseini, Y.; Mohammadi, R.K.; Yang, T.Y. Resource-based seismic resilience optimization of the blocked urban road network in emergency response phase considering uncertainties. J. Int. J. Disaster Risk Reduct. 2023, 85, 103496. [Google Scholar]
Zhou, M.; Luo, D. Enhancing road safety prediction with support vector machine regression. Saf. Sci. 2019, 117, 294–303. [Google Scholar]
Rose, R.L.; Mugi, S.R.; Saleh, J.H. Accident investigation and lessons not learned: AcciMap analysis of successive tailings dam collapses in Brazil. J. Reliab. Eng. Syst. Saf. 2023, 236, 109308. [Google Scholar]
Zhang, Z.; Qi, Q.; Cheng, Y.; Cui, D.; Yang, J. An integrated model for risk assessment of urban road collapse based on china accident data. Sustainability 2024, 16, 2055. [Google Scholar] [CrossRef]
Meng, X.; Zhang, Y.; Quan, L.; Qiao, J. A self-organizing fuzzy neural network with hybrid learning algorithm for nonlinear system modeling. Inform. Sci. 2023, 642, 119145. [Google Scholar] [CrossRef]
Zhao, P.; Zhang, L. Support vector machine regression for small sample size predictions in civil engineering applications. J. Civ. Eng. Manag. 2022, 28, 237–249. [Google Scholar]
Liang, L.; Su, T.; Gao, Y.; Qin, F.; Pan, M. FCDT-IWBOA-LSSVR: An innovative hybrid machine learning approach for efficient prediction of short-to-mid-term photovoltaic generation. J. Clean. Prod. 2023, 385, 135716. [Google Scholar] [CrossRef]
Zhou, P.; Chen, W.; Yi, C.; Jiang, Z.; Yang, T.; Chai, T. Fast just-in-time-learning recursive multi-output LSSVR for quality prediction and control of multivariable dynamic systems. Eng. Appl. Artif. Intell. 2021, 100, 104168. [Google Scholar] [CrossRef]
Liu, B.; Zhang, W.; Chen, F.; Cai, J.; Wang, X.; Liu, Y.; Zhang, J.; Wang, Q. Performance prediction and optimization strategy for LNG multistage centrifugal pump based on PSO-LSSVR surrogate model. Cryogenics 2024, 140, 103856. [Google Scholar] [CrossRef]
Zhu, M.; Chen, B.; Gu, C.; Wu, Y.; Chen, W. Optimized multi-output LSSVR displacement monitoring model for super high arch dams based on dimensionality reduction of measured dam temperature field. Eng. Struct. 2022, 268, 114686. [Google Scholar] [CrossRef]
Kumar, S.; Gupta, I. Challenges and solutions in the interpretability of support vector machine models: A road safety case study. Artif. Intell. Rev. 2023, 56, 1231–1256. [Google Scholar]
Izonin, I.; Tkachenko, R.; Shakhovska, N.; Lotoshynska, N. The Additive Input-Doubling Method Based on the SVR with Nonlinear Kernels: Small Data Approach. Symmetry 2021, 13, 612. [Google Scholar] [CrossRef]
Hoang, N.; Tran, X.; Huynh, T. Prediction of pile bearing capacity using opposition-based differential flower pollination-optimized Least Squares Support Vector Regression (ODFP-LSSVR). Adv. Civ. Eng. 2022, 2022, 7183700. [Google Scholar] [CrossRef]
Suykens, J.A.; Vandewalle, J. Least squares support vector machine classifiers. Neural Process. Letters. 1999, 9, 293–300. [Google Scholar] [CrossRef]
Sun, X.; Wu, J.; Lei, G.; Cai, Y.; Chen, X.; Guo, Y. Torque modeling of a segmented-rotor SRM using maximum-correntropy-criterion-based LSSVR for torque calculation of EVs. IEEE J. Emerg. Sel. Top. Power Electron. 2020, 9, 2674–2684. [Google Scholar] [CrossRef]
Zhang, L.; Kuang, Z.; Wang, Z.; Yang, Z.; Zhang, A. A node three-dimensional localization algorithm based on RSSI and LSSVR parameters optimization. Syst. Sci. Control Eng. 2020, 8, 477–487. [Google Scholar] [CrossRef]
Cao, K.; Dong, F.; Liu, W.; Khan, N.M.; Cui, R.; Li, X.; Hussain, S.; Alarifi, S.S.; Niu, D. Infrared radiation denoising model of “sub-region-Gaussian kernel function” in the process of sandstone loading and fracture. Infrared Phys. Technol. 2023, 129, 104583. [Google Scholar] [CrossRef]
Rabie, R.; Asghari, M.; Nosrati, H.; Niri, M.E.; Karimi, S. Spatially resolved air quality index prediction in megacities with a CNN-Bi-LSTM hybrid framework. Sustain. Cities Soc. 2024, 109, 105537. [Google Scholar] [CrossRef]
Xu, W.; Tu, J.; Xu, N.; Liu, Z. Predicting daily heating energy consumption in residential buildings through integration of random forest model and meta-heuristic algorithms. Energy 2024, 301, 131726. [Google Scholar] [CrossRef]
Rahman, M.; Rashid, F.; Roy, K.S.; Habib, M.A. Application of extreme learning machine (ELM) forecasting model on CO2 emission dataset of a natural gas-fired power plant in Dhaka, Bangladesh. Data Brief 2024, 54, 110491. [Google Scholar] [CrossRef] [PubMed]
Aloisio, A.; Pasca, P.D.; Owolabi, D.; Loss, C. Vibration serviceability of hybrid CLT-steel composite floors based on experimental and numerical investigations using random walk models. Eng. Struct. 2024, 304, 117600. [Google Scholar] [CrossRef]
Zhao, J.; Li, Y.; Yu, X.; Zhang, X. Levenberg-Marquardt Algorithm for Mackey-Glass Chaotic Time Series Prediction. Discret. Dyn. Nat. Soc. 2014, 2014, 193758. [Google Scholar] [CrossRef]
Li, Z.K. Risk Assessment and Prediction of Ground Collapse Caused by Pipeline Leakage; Shandong Jianzhu University: Shandong, China, 2021; (In Chinese). [Google Scholar] [CrossRef]

Figure 1. The impact of road collapses on urban streets in Qingdao, China.

Figure 2. Flowchart of the adaptive parameter optimization process.

Figure 3. R² results of compared methods for Mackey–Glass time series prediction: (a) AD-LSSVR; (b) LSSVR; (c) LSTM; (d) RF; (e) ELM; and (f) RW.

Figure 4. R² results of compared methods for nonlinear dynamic system identification: (a) AD-LSSVR; (b) LSSVR; (c) LSTM; (d) RF; (e) ELM; and (f) RW.

Figure 5. Flowchart of road collapse timing prediction process.

Figure 6. Prediction results of compared methods: (a) AD-LSSVR; (b) LSSVR; (c) LSTM; (d) RF; (e) ELM; and (f) RW.

Figure 7. Prediction errors of compared methods.

Table 1. Performance comparison of different methods for Mackey–Glass time series prediction.

Method	RMSE	R²	Testing Time (s)
AD-LSSVR	0.93	0.9965	0.015
LSSVR	1.31	0.9832	0.014
LSTM [32]	0.89	0.9967	5.600
RF [33]	3.42	0.9791	0.012
ELM [34]	3.76	0.9642	0.011
RW [35]	6.32	0.9104	0.012

The results in this table are the averages of 30 runs.

Table 2. Performance comparison of different methods for nonlinear dynamic system identification.

Method	RMSE	R²	Testing Time (s)
AD-LSSVR	0.0081	0.9971	0.011
LSSVR	0.0207	0.9902	0.010
LSTM [32]	0.0097	0.9954	4.137
RF [33]	0.0232	0.9873	0.009
ELM [34]	0.0153	0.9893	0.009
RW [35]	0.0768	0.9781	0.007

The results in this table are the averages of 30 runs.

Table 3. Partial road collapse data.

X₁ (Dr)	X₂ (cm/s)	X₃ (mm)	X₄ (mm)	X₅ (mm)	X₆ (mm)
0.2	7.7	168	100	105	150
0.2	7.7	162	94	98	145
0.2	7.7	168	98	92	134
0.2	7.7	155	94	102	122
0.2	7.7	156	97	100	136

Table 4. Performance comparison of different methods.

Method	RMSE	R²	Testing Time (s)
AD-LSSVR	1.97	0.9891	0.016
LSSVR	4.74	0.9575	0.014
LSTM [32]	3.32	0.9771	6.300
RF [33]	5.12	0.9472	0.013
ELM [34]	6.23	0.9363	0.012
RW [35]	11.5	0.8821	0.026

The results in this table are the averages of 30 runs.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Han, Y.; Quan, L.; Liu, Y.; Zhang, Y.; Li, M.; Shan, J. Adaptive Difference Least Squares Support Vector Regression for Urban Road Collapse Timing Prediction. Symmetry 2024, 16, 977. https://doi.org/10.3390/sym16080977

AMA Style

Han Y, Quan L, Liu Y, Zhang Y, Li M, Shan J. Adaptive Difference Least Squares Support Vector Regression for Urban Road Collapse Timing Prediction. Symmetry. 2024; 16(8):977. https://doi.org/10.3390/sym16080977

Chicago/Turabian Style

Han, Yafang, Limin Quan, Yanchun Liu, Yong Zhang, Minghou Li, and Jian Shan. 2024. "Adaptive Difference Least Squares Support Vector Regression for Urban Road Collapse Timing Prediction" Symmetry 16, no. 8: 977. https://doi.org/10.3390/sym16080977

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaptive Difference Least Squares Support Vector Regression for Urban Road Collapse Timing Prediction

Abstract

1. Introduction

2. Preliminaries

2.1. Regression Problem

2.2. SVR

3. Adaptive Difference LSSVR Model

3.1. Construction of AD-LSSVR

3.2. Adaptive Parameters Optimization Based on Grid Search and Cross-Validation

3.3. Sliding Window Method for Data Processing

4. Experimental Study

4.1. Mackey–Glass Time Series Prediction

4.2. Nonlinear Dynamic System Identification

4.3. Road Collapse Timing Prediction

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

X₁ (Dr)	X₂ (cm/s)	X₃ (mm)	X₄ (mm)	X₅ (mm)	X₆ (mm)
0.2	7.7	168	100	105	150
0.2	7.7	162	94	98	145
0.2	7.7	168	98	92	134
0.2	7.7	155	94	102	122
0.2	7.7	156	97	100	136

X₁ (Dr)	X₂ (cm/s)	X₃ (mm)	X₄ (mm)	X₅ (mm)	X₆ (mm)
0.2	7.7	168	100	105	150
0.2	7.7	162	94	98	145
0.2	7.7	168	98	92	134
0.2	7.7	155	94	102	122
0.2	7.7	156	97	100	136

X₁ (Dr)	X₂ (cm/s)	X₃ (mm)	X₄ (mm)	X₅ (mm)	X₆ (mm)
0.2	7.7	168	100	105	150
0.2	7.7	162	94	98	145
0.2	7.7	168	98	92	134
0.2	7.7	155	94	102	122
0.2	7.7	156	97	100	136