1. Introduction
Land subsidence is an environmental geological phenomenon in which regional ground elevation decreases with the compression of the surface soil of the earth’s crust under the action of natural and human factors [
1]. It is one of the most common geological disasters worldwide [
2]. In recent years, the degree and extent of land subsidence have been deepening and increasing along with the acceleration of urbanization in China. Meanwhile, urban areas are the entities most affected by land subsidence owing to various factors, such as the over-exploitation of groundwater, tunneling, and urban expansion, which are characterized by slow genesis, long duration, a wide impact range, complex causal mechanisms, and great difficulty in their prevention and control [
2,
3,
4]. In the past 40 years, the losses caused by land subsidence in China have reached 300 billion. Urban land subsidence not only affects the productivity and lives of urban residents and traffic safety, but also severely hinders sustainable socioeconomic development [
1]. Therefore, it is urgent and challenging to explore an effective method for predicting land subsidence [
1].
Kunming, the capital and largest city of Yunnan Province in southwestern China, is located in the middle of the Yunnan–Guizhou Plateau, surrounded by the Dianchi Lake to the south and mountains on the remaining three sides. Located on China’s frontier with Southeast and South Asia, Kunming has implemented massive urban construction over the past two decades. During the process, a large amount of arable land and gardens has been occupied by high-rise buildings, roads, and other structures [
5]. However, this large-scale urbanization has induced a series of geological hazards, such as landslides, subsidence, and ground cracks. In particular, the extent, amount, and rate of ground subsidence change from year to year, with an overall trend of increase, making it one of the most prominent geological hazards and significantly damaging houses, roads, canals, pipelines, and other forms of infrastructure [
4]. Ground subsidence in Kunming is the loss of ground elevation caused by a combination of natural and human factors. The major natural factors include geological formations and hydrogeology, and the human factors primarily consist of groundwater mining and urban construction [
6,
7].
Interferometric synthetic aperture radar (InSAR) technology is an all-weather monitoring method, which has the advantages of large scale, low cost, high speed and high accuracy, and can be used for the precise measurement of surface deformation, such as in urban land subsidence [
8,
9,
10], landslide monitoring [
11,
12], earthquake analysis [
13], infrastructure assessment [
14,
15], and others [
16]. With the development of high-resolution SAR (synthetic aperture radar), remote sensing technology has truly entered the era of high resolution, presenting high potential in the refined monitoring of surface deformation. Moreover, the data of the Earth observation satellite Sentinel-1A/B in the European Space Agency’s Copernicus program (Global Monitoring for Environment and Security, GMES) is free and open-access, providing users with rich and useful synthetic aperture radar data. However, the acquisition, storage, preprocessing, and a series of time-series parameter inversions of a large amount of image data in the monitoring area require excessive processing time, posing challenges to the computer performance, disk space, and other hardware conditions [
17]. It is more challenging to monitor large-scale surface deformation. Concurrently, atmospheric delay error and phase unwrapping error are two of the main types of error in deformation inversion. Therefore, it is essential to handle atmospheric delay errors and phase-unwrapping errors when obtaining surface deformation information. An open-source InSAR time-series analysis method, LiCSBAS (the small-baseline subset within LiCSAR), proposed by Morishita et al. [
17] for an automated Sentinel-1 InSAR processor, can effectively solve the problem that large-scale monitoring requires a significant amount of processing time and can overcome atmospheric delay errors and phase solution. Entanglement errors can be effectively controlled using LiCSBAS. Thus, it is especially suitable for the acquisition of large-scale urban land-subsidence deformation information.
Traditional land-subsidence prediction models are mainly divided into three categories: (1) physical process models, which use factors such as soil and water combined with geotechnics to model the physical process of land subsidence [
18], based on the physical mechanism of subsidence, and perform predictions by simulating the subsidence process; (2) mathematical and statistical models, which usually adopt discrete time-series data to predict future subsidence states based on the mathematical and statistical prediction of historical subsidence [
18]; and (3) neural network models, which generally employ the acquired subsidence data to predict future subsidence states through neural networks [
19,
20]. However, physical process models possess complex parameters, and the relevant data are difficult to obtain, significantly limiting their application [
1]. The simple statistical laws of mathematical–-statistical models do not easily explain complex ground subsidence phenomena, and mathematical–statistical models frequently lack physical and geoscientific bases, while the accuracy of time-series methods is affected by the quality of historical data [
1,
21]. In recent years, neural network models have been widely applied in many fields [
19,
22,
23]. Related studies revealed that the use of neural networks has achieved good results in the prediction of various types of engineering deformation. Meanwhile, urban land subsidence presents nonlinear characteristics, and neural networks have powerful nonlinear mapping capabilities, laying a theoretical foundation for the prediction of urban ground models using neural network models. However, the existing neural network models have the following three drawbacks: (i) they rely excessively on historical subsidence data, which only play a fitting role and cannot effectively make predictions; (ii) they cannot accurately capture or predict the fluctuations of sequence deformation and, therefore, cannot obtain satisfactory prediction results [
21]; (iii) they are limited by the training samples, which only allow the prediction of small-scale subsidence and not large-scale subsidence. A literature review was performed to manage the drawbacks of existing neural network models for subsidence prediction and to perform large-scale urban land-subsidence predictions. It was demonstrated that researchers have established a land-subsidence prediction model based on machine learning from a multi-factorial perspective [
24]. Under the determination of the nonlinear relationship between the influencing factors and land subsidence, an XGBoost (eXtreme Gradient Boosting) land-subsidence prediction model with good results (namely, the construction of a land-subsidence prediction model based on a neural network algorithm from a multi-factorial perspective) can provide a solution for the existing neural network model, which cannot perform land-subsidence prediction, and overcome the drawback of the model’s over-reliance on subsidence data. Regarding the drawback that existing neural network models cannot accurately capture or predict the fluctuation in the sequence deformation, InSAR deformation time-series prediction based on a long–short memory (LSTM) neural network has been proposed [
21]. It can overcome the limitations of previous fitting analyses, which were based only on existing data, perform a multi-factorial prediction for a single subsidence point, and accurately capture and predict sequence deformation. Concerning large-scale urban ground subsidence, which has numerous types of subsidence sequence, the selection of training samples directly affects the final prediction accuracy. Since K-shape is a new time-series clustering algorithm [
25] and can efficiently generate time-series clusters, it was chosen to cluster the subsidence time series.
The traditional artificial neural network (ANN) model, with its advantages of self-adaptation, self-learning, nonlinear mapping, and fault tolerance, especially BP (back propagation) neural network, has a high self-learning ability, which allows it to tackle complex deformation problems, while its output results are influenced by the initial weights and thresholds [
21,
26]. The genetic algorithm (GA) and the particle swarm optimization (PSO) algorithm are evolutionary computational techniques [
27]. The genetic algorithm is based on Darwin’s theory of evolution [
28]. The particle swarm optimization algorithm is inspired by the social behavior of flocks of birds and fish [
28]. Both methods are employed for the global optimization of variables to obtain better prediction results. Therefore, genetic algorithms and particle swarm optimization algorithms can also be used to optimize the weights and biases of neural network models and, thus, improve the prediction accuracy of neural network models [
29]. The K-shape time-series clustering algorithm proposed by John Paparrizos et al. [
25] can efficiently compare sequences and compute sequence centers while ensuring scaling invariance, translation invariance, and transformation invariance. The main methods include shape-based distance (SBD) and computing the center of mass of the class (which preserves the shape and features of the class) based on the SBD. Long short-term memory neural networks are a special type of recurrent neural network (RNN), designed specifically for processing sequential data, and have unique advantages in learning time-series data features [
21,
30]. LSTM neural networks consist of two main components (the storage module and the gate module), which are responsible for stabilizing the transmission of information and the control over the information passed. Compared with traditional artificial neural networks, the storage and gate modules enable LSTM neural networks to better capture time-series data fluctuations and obtain desirable prediction results [
31,
32].
In summary, this study aims to address the drawbacks of the lack of reliable sample data for large-scale urban land-subsidence prediction and the existing neural network algorithms, which mostly fit, but rarely achieve prediction. Specifically, the main urban area of Kunming was taken as the research object. LiCSBAS technology was used to obtain the land-subsidence deformation information from 2018–2021 in the main urban area of Kunming. The time-series clustering K-shape algorithm was adopted to cluster the acquired land-subsidence time-series. Next, the clustered subsidence points were classified, and hydrogeology, geological structure, fault, groundwater, high-speed railways, and high-rise buildings were selected as the influencing factors. Subsequently, the PSO-BP neural network algorithm was constructed to predict the urban land subsidence from a multi-factorial perspective. Finally, the fluctuation in the urban land-subsidence sequence deformation was predicted using the LSTM neural network from a multi-factorial perspective in order to achieve large-scale, high-precision urban land-subsidence prediction.
5. Conclusions
Existing neural network models are over-reliant on historical subsidence data for urban land-subsidence prediction and cannot accurately capture or predict fluctuations in the sequence deformation. Regarding large-scale urban land-subsidence prediction, the improper selection of training samples directly affects the final results and the prediction accuracy. Therefore, this paper proposed a subsidence prediction method based on a neural network algorithm from a multi-factorial perspective, given the shortcomings of the previous neural network model in urban land-subsidence prediction. Additionally, a K-shape clustering algorithm was adopted to select a large range of training samples. Finally, the subsidence rate and time-series subsidence of the main urban area of Kunming from 2018 to 2021 were predicted to explore the use of high-precision urban land-subsidence prediction methods. The conclusions are as follows.
(1) The LiCSBAS method can effectively monitor the urban land subsidence in the main urban area of Kunming. A new time-series method, LiCSBAS, was used to monitor the maximum land-subsidence rate of −30.591 in the main urban area of Kunming from 2018 to 2021. There are four significant subsidence areas in the main urban area of Kunming City, which are unevenly distributed along Dianchi Lake. The results revealed that land subsidence is more likely to occur within 200–600 m of large commercial areas and high-rise buildings, within 400–1200 m of the subway currently under construction, and within 109–117 mm of the average annual rainfall. The existence of faults will destroy the stability of the soil structure and increase the land subsidence. The hydrogeology, geological structure, and groundwater also have a certain influence on the land subsidence of the main urban area of Kunming.
(2) After clustering, the multi-factorial PSO-BP model can effectively predict large-scale urban land subsidence. The K-shape clustered data Cluster 1, Cluster 2, and Cluster 3 were predicted by the corresponding multi-factorial PSO-BP model for the validation set and test set. The corresponding mean square errors (MSE) were as follows: 1.519, 1.465; 1.419, 1.441; and 1.485, 1.494, respectively. The smaller the MSE, the better the accuracy of the prediction model in describing the experimental data. Moreover, there was no overfitting phenomenon. Among the 24,540 predicted sedimentation-rate points, 24,432 had a deformation accuracy greater than 0 and less than ±10 mm, accounting for about 99.5%; and 108 had a deformation accuracy greater than ±10mm, accounting for about 0.5%. It was demonstrated that the prediction accuracy met the requirements of the measurement specification. The prediction accuracy after clustering was slightly improved compared to the accuracy when no clustering was used.
(3) The constructed multi-factorial LSTM can effectively capture and predict fluctuations in the sequence deformation. The constructed multi-factorial LSTM model was used to predict the next ten periods of any time-series subsidence data in the three types of data. The root mean square errors (RMSE) of the three types of data (Cluster 1, Cluster 2, and Cluster 3) were 0.445, 1.475, and 1.468 mm, respectively; the mean absolute errors (MAE) were 0.319, 1.214, and 1.167 mm, respectively; the absolute error ranges were 0.007~1.030, 0~3.001, and 0.401~3.679 mm, respectively. This prediction accuracy met the requirements of measurement specifications.
(4) The results demonstrate the application of large-scale, high-precision urban land-subsidence prediction. The prediction model we constructed from the perspective of multiple factors can effectively predict the land-subsidence rate and time-series subsidence of large cities, suggesting that it can be used to perform prediction by inputting the corresponding influencing factors. Our research expands the application scope of land-subsidence prediction models from the relationship between multiple factors and subsidence. This is different from previous studies, which adopted existing monitoring data to build their models. This paper lays a foundation for large-scale urban land-subsidence prediction.