**2. Methodology**

#### *2.1. Overview*

The association mining method as shown in Figure 1 was used to mine the association rules between the triggering factors and landslide displacement, which mainly includes three parts: feature engineering, clustering and association rule mining.

**Figure 1.** Flowchart of association rule mining for the Lishanyuan landslide.

In the feature engineering part, for the original multi-source data obtained from landslide monitoring, the sliding window method is used to scan the monitoring data time series of each source. In the scanning process, the 3*σ* criterion is first used to eliminate obvious outliers, and then the corresponding features are calculated according to the type of monitoring data, and finally the feature time series data set is formed.

In the clustering part, for the feature time series obtained in the previous part, the PSO-optimized k-means algorithm is first used for clustering, and then the time series are transformed into item sets, and finally the time series of all features are processed in the same way to build the transaction database.

In the association rule mining part, for the transaction database constructed in the previous section, the Apriori algorithm is used to mine the frequent item sets and association rules in the transaction database and analyze the disaster factors and destabilization precursors of landslides accordingly.

#### *2.2. PSO-Optimized k-Means Algorithm*

The original value-based monitoring dataset must be changed into a category-based transactional database since the Apriori algorithm for association rule mining is categorybased. The k-means algorithm is the most well-known clustering algorithm, whose core objective is to classify the dataset into K clusters, with the elements in each cluster having a high degree of similarity. The k-means algorithm is simple to implement and fast to cluster, but it is very sensitive to the choice of initial cluster centers. Different initial values may lead to different clustering results, i.e., local optima rather than global optima. To solve this problem, we used the PSO algorithm for global optimization. The PSO algorithm is an evolutionary algorithm based on population intelligence that finds the optimal solution by simulating the process of a flock of birds searching for food. The specific steps of the k-means clustering algorithm optimized by PSO are as follows:

Step 1: Particle swarm initialization. Suppose there is a particle swarm composed of m particles in a given D-dimensional search space, and each particle has only two attributes: position and velocity, where position is the code of the solution to be solved and the velocity is the iteration step size.

For the *i* − *th* particle, its coordinate position can be expressed as:

$$X\_i = \begin{pmatrix} \mathbf{x}\_{i1} & \mathbf{x}\_{i2} & \cdots \end{pmatrix} \tag{1}$$

The velocity of the *i* − *th* particle can be expressed as:

$$V\_i = \begin{pmatrix} v\_{i1} & v\_{i2} & \cdots \end{pmatrix} \tag{2}$$

When performing k-means clustering on the dataset *D* = *x*1, *x*2, ··· , *xn* , the initial cluster centers *C* = *μ*1, *μ*2, ··· , *μ<sup>k</sup>* need to be specified. In order to avoid the problem of local optimal clustering caused by the sensitivity of *C*, we coded *C* as *Xi* in Equation (1) for global optimization.

Step 2: Particle clustering and fitness calculation. Perform k-means clustering after decoding each particle in the particle swarm. The specific steps are as follows:

Sub-step 2.1: For each element *xi* in the dataset *D*, the Euclidean distance *dij* = *<sup>n</sup>* ∑ *i*=1 *xi* − *μ<sup>j</sup>* <sup>2</sup> between *xi* and the center *<sup>μ</sup><sup>j</sup>* of each cluster is calculated and the current element *xi* is assigned to the cluster *Cj* represented by the center with the

smallest distance. Sub-step 2.2: For each cluster *Cj* obtained in Sub-step 2.1, the central *μ <sup>j</sup>* <sup>=</sup> <sup>1</sup> <sup>|</sup>*Cj*<sup>|</sup> <sup>∑</sup> *x*∈*Cj x*

of that cluster is recalculated and the *C* = *μ*1, *μ*2, ··· , *μ<sup>k</sup>* is updated.

Sub-step 2.3: Repeat the sub-step 2.1 and 2.2 until the center *μ <sup>j</sup>* and element *xi* of each cluster *Cj* no longer change. Then, the final clustering result can be obtained.

Sub-step 2.4: To evaluate the clustering effect of the current position of each particle, the following equation is used to calculate the fitness *F*(*i*) of each particle.

$$F(i) \;= \sum\_{i=1}^{n} \sum\_{j=1}^{k} (\mathbf{x}\_i - \boldsymbol{\mu}\_j) \tag{3}$$

where *xi* denotes the *i* − *th* element in the dataset, and *μ<sup>j</sup>* is the center of the *i* − *th* cluster. The fitness function represents the sum of the squares of the distances between each element and the center of the cluster to which the element belongs, and the smaller the fitness, the better the clustering effect. The individual optimal solution *Pi* and the group optimal solution *gbest* can be obtained through fitness.

The optimal position searched by the *i* − *th* particle is denoted as:

$$P\_i = \begin{pmatrix} p\_{i1} & p\_{i2} & \cdots \end{pmatrix} \tag{4}$$

The optimal position searched by the particle swarm is denoted as:

$$\mathbf{g\_{best}} = \begin{pmatrix} \mathbf{g\_{1\prime}} & \mathbf{g\_{2\prime}} & \cdots \end{pmatrix} \tag{5}$$

Step 3: Position update. Update the position and velocity of each particle with the following equation:

$$V\_{i}^{k+1} = \omega V\_{i}^{k} + c\_{1}r\_{1}(P\_{i}^{k} - X\_{i}^{k}) + c\_{2}r\_{2}(g\_{best}^{k} - X\_{i}^{k})\tag{6}$$

$$X\_i^{k+1} = X\_i^k + V\_i^{k+1} \tag{7}$$

where *V<sup>k</sup> <sup>i</sup>* denotes the velocity of the *<sup>i</sup>* <sup>−</sup> *th* particle at the *<sup>k</sup>* <sup>−</sup> *th* iteration. *<sup>X</sup><sup>k</sup> <sup>i</sup>* denotes the position of the *<sup>i</sup>* <sup>−</sup> *th* particle at the *<sup>k</sup>* <sup>−</sup> *th* iteration. *<sup>P</sup><sup>k</sup> <sup>i</sup>* denotes the individual optimal solution of the *<sup>i</sup>* <sup>−</sup> *th* particle up to the *<sup>k</sup>* <sup>−</sup> *th* iteration. *<sup>g</sup><sup>k</sup> best* denotes the population optimal solution of the particle swarm as of the *k* − *th* iteration. *c*<sup>1</sup> and *c*<sup>2</sup> denote the acceleration constants to adjust the step size. *r*<sup>1</sup> and *r*<sup>2</sup> denote the random numbers between 0 and 1, respectively, to enhance the randomness of the search process.

Step 4: After the velocity and position of each particle are updated, the particles that are out of the solution range are initialized randomly again. If the current fitness function value is better than the historical optimal *Pi*, then update *Pi*. Similarly, if the population fitness function value of the updated particle population is better than the historical optimal *gbest*, then update *gbest*.

Step 5: Repeat Step 2 to 4, and constantly update and iterate for all particles until the maximum number of solutions is reached or the aggregation degree *σ*<sup>2</sup> of the group optimal solution *gbest* is less than the given threshold.

$$
\sigma^2 = \frac{1}{n} \sum\_{i=1}^n \left[ F(i) - \overline{F} \right]^2 \tag{8}
$$

where *F* is the average fitness of the particle swarm, and *σ*<sup>2</sup> represents the aggregation degree of the particles in the particle swarm. The smaller its value, the higher the convergence degree of the PSO algorithm. When *σ*<sup>2</sup> is less than the given threshold, this means that the particles are all clustered near the global solution. At this time, the particle with the best fitness is the initial center of the global optimal clustering, and the clustering result can be obtained using k-means for clustering.

#### *2.3. Association Rule Mining and Apriori Algorithm*

Association rule mining refers to the discovery of valuable correlation information and knowledge rules from data sets. The Apriori algorithm is the most classic algorithm for mining association rules. Suppose *I* = *i*1, *i*2, ··· , *im* is an item set, each element *im* of which is called an item, and the item set of length *k* is called *k* − *itemset*. A subset of item set *I* can form a transaction, and multiple transactions can form a transaction database *T* = *t*1, *t*2, ··· , *tn* . Suppose *X* and *Y* are two item sets in the transaction database whose intersection is empty, that is, *<sup>X</sup>* <sup>⊂</sup> *<sup>T</sup>*,*<sup>Y</sup>* <sup>⊂</sup> *<sup>T</sup>* and *<sup>X</sup>* <sup>∩</sup> *<sup>Y</sup>* <sup>=</sup> <sup>∅</sup>. These two item sets can be denoted by *X* ⇒ *Y* and if there is an association rule the former item *X* denotes the condition of the association rule and the latter item *Y* denotes the conclusion of the association rule. To better measure the performance of the mined association rules, three indicators need to be used: support, confidence and lift. Their definitions are as follows:

Support is the probability that *X* and *Y* occur together in the transaction database *T*. Support indicates the importance of association rule *X* ⇒ *Y* in the total data:

$$S\_{X \Rightarrow Y} = \frac{|T(X \cup Y)|}{|T|} \tag{9}$$

Confidence is the probability that *Y* will occur if *X* is included. Confidence expresses the validity of the association rule *X* ⇒ *Y*:

$$\mathbb{C}\_{X \Rightarrow Y} = \frac{|T(X \cup Y)|}{|T(X)|} \tag{10}$$

Lift is the ratio of the confidence to the occurrence probability of the later term *Y* in the transaction database *T*. Lift indicates the strength of the correlation, and the larger the lift, the stronger the correlation:

$$L\_{X \Rightarrow Y} = \frac{|T(X \cup Y)|}{|T(X)|} / \frac{|T(Y)|}{|T|} \tag{11}$$

where |*T*(*X* ∪ *Y*)| represents the number of item sets *X* and *Y* appearing in the transaction database *T* at the same time. |*T*| represents the number of transactions in the transaction database *T*. |*T*(*X*)| and |*T*(*Y*)| represent the number of item sets *X* or *Y* appearing in the transaction database *T*, respectively.

The minimum support *min*\_*supp* and minimum confidence *min*\_*conf* need to be specified as thresholds in association rule mining. If the support of an item set is greater than *min*\_*supp*, then this item set is called frequent item set. If the support and confidence of an association rule are greater than the *min*\_*supp* and *min*\_*conf* , then this rule is called a strong association rule. The specific flow of the Apriori algorithm is shown in Figure 1 and described in detail as follows:

Step 1: Iterate through all the transactions in the transaction database *T* and count the number of each item and calculate the support. The items with the support greater than *min*\_*supp* are deleted to generate the frequent 1-item set *L*1.

Step 2: Generate candidate 2-item set for *L*<sup>1</sup> by joining and pruning operations, calculate the support of each item in the candidate 2-item set and also filter according to the *min*\_*supp* to get the frequent 2-item set *L*2. Repeat this process until the candidate *k* − *itemset* is empty, thus obtaining the frequent *k* − *itemset*.

Step 3: Calculate the confidence of each *Lk* separately, and output the association rules with confidence greater than *min*\_*conf* .

#### **3. Study Area**

#### *3.1. Landslide Overview*

The Lishanyuan landslide is located in Xinhua County, Hunan Province, China (Figure 2). The longitudinal length of the landslide is 120 m, the horizontal width is 300 m, the average thickness is about 3 m, and the total volume is about 1.08 <sup>×</sup> 105 <sup>m</sup>3. The landslide is a shallow landslide with a main slide direction of 210◦. The middle and back edges of the slope are well covered with vegetation. There are several residential houses at the left foot of the slope. The area on the right side of the slope is poorly covered with vegetation. There is a village-level road and a small stream at the front edge of the landslide, and the foot of the slope has been washed by the river for a long time. Due to long-term river scouring at the foot of the slope, the landslide initially showed accelerated deformation characteristics in 1996. From then until 2012, it underwent a slow deformation trend. In 2013, the landslide accelerated again, with multiple cracks on the slope and subsidence of the village-level road. In April 2018, affected by heavy rainfall, the landslide had a local slip of about 600 m3, and the sliding soil fell to the walls and windows of residential houses on the lower side of the slope, causing a direct loss of about 600,000 RMB. According to the on-site investigation, the landslide is a small and shallow traction landslide, which is very common and representative in Hunan Province, China.

**Figure 2.** Geographical location and monitoring scheme of Lishanyuan landslide. (**a**) Site photograph of the Lishanyuan landslide. (**b**) Geographical location of the Lishanyuan landslide. (**c**) Photographs of monitoring stations DB02 and YL01. (**d**) Photograph of the DB01 monitoring station.

#### *3.2. Deformation Characteristics*

To protect the safety of the residents below the landslide, we completed the deployment and commissioning of monitoring equipment to establish a monitoring and early warning system on 15 April 2021. The location and photos of the monitoring stations are shown in Figure 2. Two GNSS monitoring stations, named DB01 and DB02, were deployed on the main slide profile of the landslide, and the GNSS base stations are located on the roadside of the lower side of the landslide. In addition, a rain gauge named YL01 was deployed at DB02. The automated monitoring system received the first monitoring data at 17:00 on 15 April, and the default acquisition interval of the GNSS monitoring stations was 1 h. As the landslide appeared to accelerate significantly on 17 May, the GNSS monitoring stations adjusted the collection interval to 30 min, and the collection interval of the rain gauge was adjusted to 20 min. As of 15:00 p.m. on 1 July 2022, a total of 57,597 monitoring data were collected by the monitoring system, including 30,396 GNSS monitoring data and 27,201 rainfall monitoring data. The monitoring data are shown in Figure 3.

From Figure 3, it can be seen that the deformation patterns of the two GNSS monitoring stations are basically the same, but the deformation amplitude of DB02 is significantly larger than that of DB01, which indicates that the deformation of the leading edge of this landslide is larger than that of the trailing edge of the landslide, which is consistent with the deformation characteristics of the traction landslide. The threshold design and warning process of this landslide are described in Bai et al. [16] The deployed monitoring and warning system is able to accurately and quickly identify the accelerated deformation process of the landslide and report timely warnings. To verify the reliability of the monitoring data, we inspected the landslide site on 19 May 2021. At this time, the landslide area had just experienced a strong rainfall, and the monitoring data from two GNSS monitoring stations showed that the landslide had been violently deformed. We found multiple cracks in the landslide body during a site inspection (Figure 4), obvious slippage, soil accumulation at the foot of the slope, and small mudslides in the local area. These macroscopic phenomena are consistent with the monitoring and early warning results, proving the effectiveness and reliability of the monitoring and early warning system.

**Figure 3.** Daily rainfall data and displacement data from two GNSS monitoring stations for the Lishanyuan landslide.

**Figure 4.** On-site inspection photos on 19 May 2021. (**a**) Long cracks on the surface of the landslide. (**b**) Loose deposit near the DB02 station. (**c**) Multiple cracks near DB01 station. (**d**) partial collapse near DB02 station.

The deformation process of Lishanyuan landslide shows obvious correlation with the rainfall process. Taking the DB02 monitoring station with the most obvious deformation as an example, the displacement of the two GNSS monitoring stations first showed a fluctuation of 10 mm for about a week after the monitoring started, indicating that the measurement accuracy of GNSS was of centimeter level. Affected by the rainfall event on 22 April 2021, the acceleration process began with the synchronization of the displacements of the two GNSS monitoring stations starting at 4:00 a.m. on 23 April. After that, the displacements of the two monitoring stations showed a step-like growth, and each severe deformation process was accompanied by concentrated high-intensity rainfall. After mid-October, the rainfall decreased, and the deformation began to slow down, showing creep characteristics. After April of the following year, the landslides started a process of obvious deformation and acceleration again.

#### *3.3. Feature Engineering*

From the deformation characteristics reflected by the Lishanyuan landslide monitoring data, we found that the deformation process of the landslide showed an obvious correlation with the rainfall process. To further mine the association rules of this correlation, we needed to carry out further data mining on the monitoring data, for which feature engineering was first needed. Feature engineering refers to extracting more representative features from raw monitoring data to improve the effectiveness of mining tasks. For the monitoring data and deformation characteristics of the Lishanyuan landslide, we constructed features for both deformation and velocity. In terms of deformation, we focused more on the accelerated deformation process, so the deformation velocity was the most important feature. The deformation velocity (*υDB*01,*υDB*02) of two GNSS monitoring stations was chosen as the main feature. In terms of rainfall, we paid attention not only to the short-term rainfall features, but also to the long-term rainfall features. We chose the cumulative rainfall of three hours *q*3*h*, six hours *q*6*h*, twelve hours *q*12*h*, 24 h *q*24*h*, three days *q*3*d*, and seven days *q*7*<sup>d</sup>* as the characteristics reflecting rainfall.

According to Bai et al. [40] and Liu et al. [41], the strength of correlation between features can be quantitatively determined by gray relation analysis. Therefore, we used the gray relation analysis algorithm to calculate the gray relation degree between various types of rainfall features and deformation velocity; the calculation results are shown in Table 1. From Table 1, we can see that the gray relation degree of all rainfall features and deformation velocity is greater than 0.9, which is much higher than the empirical threshold of 0.6. So, all of these rainfall features can be adopted.


**Table 1.** Gray relation degree between rainfall characteristics and displacement characteristics.

#### **4. Results**

*4.1. Clustering Results*

For the various types of feature sequences obtained from feature engineering, we used the PSO-optimized k-means algorithm to cluster each feature. The number of cluster centers for each type of feature was set to 3, thereby clustering the feature into low, medium, and high clusters. The clustering results of all features are shown in Figure 5, and the interval ranges and sample sizes of different clusters are shown in Table 2.

**Figure 5.** Visualization of all feature clustering results. (**a**) The velocity of DB01. (**b**) The velocity of DB02. (**c**) 3-h cumulative rainfall. (**d**) 6-h cumulative rainfall. (**e**) 12-h cumulative rainfall. (**f**) 24-h cumulative rainfall. (**g**) 3-day cumulative rainfall. (**h**) 7-day cumulative rainfall.


**Table 2.** Interval range and sample size of all feature clustering results.

Combining Figure 5 and Table 2, it can be seen that the number of samples in different clusters differs by an order of magnitude. The number of samples of low-rank clusters is much higher than that of middle-rank and high-rank clusters, and the number of samples of middle-rank clusters is also much higher than that of high-rank clusters. Combining Figure 5 and Table 2, it can be seen that the number of samples in different clusters differs by an order of magnitude. The number of samples of low-rank clusters is much higher than that of middle-rank and high-rank clusters, and the number of samples of middle-rank clusters is also much higher than that of high-rank clusters. Taking *υDB*<sup>01</sup> as an example, the speed of samples in the DB01-Low-Velocity cluster is between −3.64 and 4.70, which has a total of 4887 samples. The speed of samples in the DB01-Medium-Velocity cluster is between 4.78 and 15.54 with a total of 253 samples, which is an order of magnitude less than the DB01-Low-Velocity cluster. The speed of samples in the DB01-High-Velocity cluster is between 16.49 and 60.96, and the number of samples is only 56, which is an order of magnitude less than the DB01-Medium-Velocity cluster. The clustering results of other features have similar characteristics to *υDB*01, differing only in the range of intervals. The boundaries between the different clusters are very clear, and the characterized velocities or intensities of rainfall are largely consistent with the actual situation.

#### *4.2. Association Rule Mining Results*

After clustering, each cluster is named, and then the values in the features converted into category names. The category names of different features at each moment form an item set, thereby transforming the entire feature dataset into a transaction database. The Apriori algorithm was used to carry out the association rule mining study on this transaction database to mine strong association rules between rainfall features and the velocities of two GNSS monitoring stations separately. We took the velocity of GNSS monitoring stations as the latter term and the rainfall characteristics as the former term, and obtained the corresponding strong association rules based on both different *min*\_*conf* and *min*\_*supp*. For the velocity of the DB01 monitoring station, we set the *min*\_*supp* as 0.3% and the *min*\_*conf* as 80%. For landslide warning, we focused more on the high-speed deformation process, which is the DB01-High-Velocity cluster, so we filtered the eligible association rules as shown in Table 3.


**Table 3.** Association rules related to Lishanyuan landslide deformation.

For the velocity of DB02 monitoring station, we set the *min*\_*supp* as 0.1% and the *min*\_*conf* as 80%. We also filtered the association rules with DB01-High-Velocity as the latter term in the same way (see Table 3).

A lot of interesting information can be obtained from the association rules in Table 3. First, the lift of all these association rules is much greater than 1, indicating that the presence of rainfall former terms in these association rules has a significant positive effect on the high-speed deformation of landslides. Second, if the rainfall characteristics are classified into the current moment (3 h, 6 h), short-term (12 h, 24 h), and long-term (3 days, 7 days), then the recent rainfall characteristics are not significant in the association rules. For example, in Rules 3–8 and 11–16, these association rules with recent rainfall characteristics can be considered as subordinate rules of the four main rules: Rule 1, Rule 2, Rule 9, and Rule 10. Third, from the four main rules of Rule 1, Rule 2, Rule 9, and Rule 10, the high-speed deformation of landslides requires not only the occurrence of short-term rainfall characteristics, but also long-term rainfall characteristics, and the occurrence of only one of them does not induce the high-speed deformation process of landslides. Fourth, for the DB01 monitoring station, the long-term heavy rainfall characteristics are more important for high-speed deformation of the landslide, because the three-day or sevenday rainfall characteristics in Rule 1–8 are heavy rainfall, and the 12- and 24-h rainfall characteristics can be low-intensity rainfall. Fifth, for the DB02 monitoring station, not only the long-term heavy rainfall characteristics of 3–7 days but also the short-term heavy rainfall characteristics of 24 h are required.

In conclusion, by analyzing the monitoring data of the Lishanyuan landslide, it can be initially concluded that the landslide is caused by rainfall. Through association rule mining, the disaster factors can be more accurately identified as the combination of short-term rainfall and long-term heavy rainfall. When making early warning decisions, a rainfall within 24 h and a heavy rainfall with a cumulative rainfall greater than 130.60 mm within 7 days can be used as a precursor to identify the high-speed deformation of the landslide.

#### **5. Discussion**

To analyze the disaster factors of the Lishanyuan landslide and determine the precursors of high-speed deformation of the landslide, we used a combination of PSO-optimized k-means clustering algorithm and the Apriori algorithm to mine the association rules of the monitoring data. The analysis results of the mined strong association rules show that the high-speed deformation process of the Lishanyuan landslide is mainly affected by the combination of short-term rainfall of about 1 day, and long-term heavy rainfall of about 3–7 days. A rainfall within 24 h and a heavy rainfall with a cumulative rainfall greater than 130.60 mm within 7 days can be used as a precursor to identify the high-speed deformation of the landslide. Such a precursor can improve the ability of warning.

The association rule mining algorithm used in this study has the following main advantage. First, we used the sliding window method to extract features in the feature engineering part. This method improves the data utilization by considering continuous data over a period of time comprehensively, compared to considering only the features at the current moment, thus improving the reliability and representativeness of the obtained features. Second, the original k-means clustering algorithm is optimized by using the PSO algorithm, which effectively prevents the clustering results from falling into a local optimal. Third, the k-means algorithm is simple to implement and only requires a given number of clusters, which is easy to quantify. Other clustering methods that do not require specifying the number of clusters often require specifying other hyperparameters that are difficult to quantify. It is more convenient to directly specify the number of clusters for the control of clustering results. Finally, this study is based on real-time monitoring data, whose sampling intervals are hourly or even on the minute scale. Compared with ultra-long-term monitoring data at the monthly scale, it is richer and pays more attention to short-term deformation patterns of landslides, which is of great significance for early warning.

Additionally, it should be noted that our improvement of the association rule mining method results in an increase in algorithm complexity. On the one hand, we use the PSO algorithm to optimize the k-means clustering process, which is an evolutionary algorithm that requires uninterrupted iterative computation of many potential solutions, which is very complex and time-consuming. On the other hand, the Apriori algorithm for mining association rules needs to scan the entire transaction database when processing frequent candidate sets, which has high algorithm complexity, a huge amount of calculation, and is very time-consuming. With the improvement of technology and the passage of monitoring time, the number of monitored landslides and the volume of data will also increase sharply in the future. It is an inevitable trend to explore simple and fast data mining algorithms.

In this study, the Apriori algorithm was used to mine association rules. Therefore, the numerical dataset was converted into a category-type transaction database. This method cannot further quantify association rules and is easily affected by clustering results. Meanwhile, the Apriori algorithm does not consider the time series characteristics of item sets in the mining process of association rules, which results in ignoring the influence of sequence pattern in the mining process. Future research needs to explore a data mining method that uses numerical datasets and considers sequential patterns in order to mine more valuable information.

#### **6. Conclusions**

For the monitoring data of the Lishanyuan landslide, the sliding window method was used to extract the features, and gray relation analysis was used to screen the features. Then the PSO-optimized k-means algorithm was used to cluster. Finally, the Apriori algorithm was used to mine the strong association rules between deformation speed and rainfall characteristics to analyze the disaster factors of the Lishanyuan landslide and propose the precursors that can be used for early warning. The following conclusions were obtained from this study:

The sliding window method was adopted to achieve feature extraction of highfrequency monitoring data, which can make full use of the data and be more representative. Using PSO-optimized k-means algorithm to cluster feature engineering can effectively avoid the clustering results falling into local optimal. By clustering, the numerical dataset is transformed into transaction database, and the strong association rules can be mined using the Apriori algorithm. This research developed mining of association rules of monitoring data at hourly or even minute scale. Compared with ultra-long-term monitoring data at monthly scale, we should pay more attention to short-term deformation patterns, which are more conducive to short-term real-time early warning.

The results of association rules mining show that the high-speed deformation process of the Lishanyuan landslide is mainly affected by the combination of short-term rainfall of about 1 day and long-term heavy rainfall of about 3–7 days. A rainfall within 24 h and heavy rainfall with a cumulative rainfall greater than 130.60 mm within 7 days can be used as a precursor to identify the high-speed deformation of the landslide.

The association rule mining algorithm used in this paper is highly complex, computationally intensive, and very time-consuming, and simpler and faster algorithms need to be explored in the future to cope with monitoring and early warning of more and more landslides. In addition, this mining process does not consider the time-series characteristics of item sets, and future research should explore sequence pattern mining, which has uncovered more and more valuable information.

**Author Contributions:** Conceptualization, J.X., D.B.; methodology, J.X., D.B.; software, J.X., D.B.; validation, J.X. and J.L.; formal analysis, J.X.; investigation, J.X.; resources, J.X. and H.H.; data curation, J.X.; writing—original draft preparation, J.X.; writing—review and editing, J.X., H.H., J.L., D.B., G.L.; visualization, D.B. and J.X.; supervision, J.X.; project administration, J.X.; funding acquisition, G.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Key research and development program of Hunan Province of China, grant number: 2020SK2135. Natural Resources Research Project in Hunan Province of China, grant number: 2021-15.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** We would like to thank the editor and the reviewers for helping us improve the quality of the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

