*2.2. Date Used*

As outlined earlier, the methodology has two main parts. One is the procedure of generating a land subsidence inventory using the PS-InSAR technique, which needs a satellite SAR dataset. The other is the conditioning factors that are taken into consideration for hazard modeling. In the following section, the process of acquiring and preparing these datasets is discussed.

#### 2.2.1. SAR Data

To generate a land subsidence inventory in order to acquire the training and testing data necessary for LSSM, sentinel-1A single look complex (SLC) SAR data provided by European Space Agency (ESA) were used. In total, 31 SAR images were obtained from January 2019 to January 2020 in ascending track with dual polarization (vertical–vertical and vertical–horizontal) (Table 1).

**Table 1.** The details of the SAR data used.


#### 2.2.2. Factors Affecting Land Subsidence

To date, there is no single guideline unanimously applied for selecting subsidenceaffecting factors. However, based on the literature of LSSM studies carried out in Iran [13,23,33,37], data that were directly and indirectly related to the subsidence-inducing factors are gathered and used for mapping LSS in the study area. These input data include the altitude, slope, aspect, plan curvature, profile curvature, topographic wetness index (TWI), distance to stream, distance to road, stream density, groundwater drawdown, and land use/land cover (LULC) (Table 2). Altitude (Figure 3a) and its derivative factors such as slope (Figure 3b), aspect (Figure 3c), TWI (Figure 3d), plan (Figure 3e), and profile (Figure 3f) are among crucial topo-hydrological criteria of ground subsidence [2]. An advanced space borne thermal emission and reflection radiometer (ASTER) digital elevation model (DEM) was obtained (1 arc second or approximately 30 m resolution) and processed in the GIS environment to produce topography-related factor layers. Land subsidence can cause deformations in the Earth's surface slope and topography [43]. Therefore, altitude and its derivatives were considered because they can directly affect LS. The slope can have a potential impact on runoff infiltration since steep slopes bring about less recharge due to limited infiltration of rainfall [2,37]. As mentioned above, the main cause of land subsidence in Iran is excessive underground water extraction. Undue groundwater extraction results in pore water pressure (PWP) decrease and aquifer compaction increase [44]. Therefore, the well inventory of the study area was acquired from the Iranian department of water resource management (IDWRM) to generate groundwater drawdown (Figure 3k). Further, distance to stream (Figure 3i) and stream density (Figure 3j) were used, as these factors can impact the groundwater level by recharging the groundwater tables [45]. The network of streams was extracted from the ASTER DEM. Another geomorphological influential factor is the ground lithology of the area. However, we could not take the lithological layer into account since this factor did not have substantial variation in the region of interest. Road data were obtained from the open street map (OSM) at the scale of 1:100,000, and distance to a road was calculated in the GIS environment (Figure 3h). Google Earth Engine (GEE) cloud computing, gathering massive volumes of various satellite imagery alongside popular machine learning algorithms, is a suitable platform for analyzing geo big data and monitoring the environment [46,47]. The available 10-m Iran-wide LULC map was used. The map was generated in the GEE platform using Sentinel-1 and Sentinel-2 images and object-based random forest classifier with 95% overall accuracy [48]. All the data were generated or resampled to a 30 m pixel size.

**Table 2.** The details of the input conditioning factors.


**Figure 3.** *Cont*.

**Figure 3.** Land subsidence conditioning factors: (**a**) altitude; (**b**) slope; (**c**) slope aspect; (**d**) topographic wetness index (TWI); (**e**) plan curvature; (**f**) profile curvature; (**g**) land cover; (**h**) distance to road; (**i**) distance to stream; (**j**) stream density and (**k**) groundwater drawdown.

#### **3. Methodology**

In this section, the methods and models used in different parts of the approach are presented in detail. First, the PS-InSAR technique used for generating land subsidence map of the study area and the LS inventory are discussed. Then, the ANFIS model and evolutionary algorithms, GWO and ICA, are stated. Moreover, FR analysis and the approach to optimize ANFIS using meta-heuristics are presented. Finally, the validation methodology of this paper is outlined.

#### *3.1. PS-InSAR Technique*

SAR Interferometry (InSAR) is a well-established technique for measuring and monitoring ground deformation with millimetric accuracy. This is mainly based on the phase difference between SAR images acquired at different times and slightly different sensor positions. Time-series InSAR analysis aims to identify coherent image pixels (persistent scatterers (PSs)), which have high phase stability and reflect strong backscatter to the

satellite over a long time period. A baseline configuration can determine the set of interferometric image pairs, which is used in the time series analysis. The baseline shows the distance between two images, in terms of antenna position (spatial baseline), acquisition time (temporal baseline), or Doppler centroid (Doppler baseline). The single-master configuration, where each image is co-registered to a unique master image, is the most common one for PSI analysis [49,50]. The master image is chosen in the middle of the 2D spatio-temporal space so that the high coherence of all formed interferometric pairs (interferograms) is guaranteed. The interferogram contains the ground deformation phase component as well as some other distinct contributions, such as atmospheric disturbance, topographic, and flat-Earth terms. These components are removed in the next step from the interferometric phase using an external DEM.

A spatial network is formed using a primary set of points as PS candidates (PSCs) to estimate the Atmospheric Phase Screen (APS) and further densification of PS points. Since the original interferometric phase is wrapped (i.e., phase observations in the [−*π*, +*π*)), and it is composed of a large number of phase contributions, the PSCs cannot be selected based on the phase. Thus, the amplitude dispersion index (ADI) based on Equation (1) is used as an approximation of phase stability. A point can be selected as a PSC if it always has a higher amplitude value than a suitable threshold. It was proved that assuming sufficient data images, the phase behavior with the standard deviation (*σv*) lower than a threshold of 0.25 is similar to the trend of ADI. Hence, this index, which represents the phase stability of points, can be used for PSCs selection.

$$D\_A = \frac{\sigma\_A}{m\_A} \simeq \sigma\_v \le 0.25\tag{1}$$

where *σ<sup>A</sup>* is the standard deviation and *mA* is the average amplitude value of each pixel over time. Next, the spatial network is used to estimate the unknown parameters, DEM error (residual topographic phase component), and the deformation rate, along with each connection between two adjacent PSCs through the maximization of a periodogram (Equation (2)) [51]. All PSI methods are based on assumptions regarding the spatial and temporal smoothness of the deformation signal, expressed by a model. Here, the model is considered as a linear deformation trend in time.

$$\xi\left[\Delta\nu\left(p\_{ij}\right).\Delta\ln\left(p\_{ij}\right)\right] = \frac{1}{N}\sum\_{s=1}^{N}e^{j\left[\Delta\rho\_{s,k}\left(\rho\_{ij}\right)-\frac{4\pi}{\lambda}\Delta\nu\left(p\_{ij}\right)B\_{ts}-\frac{4\pi}{\lambda k\sin\theta}\Delta\ln\left(p\_{ij}\right)B\_{ns}\right]}\tag{2}$$

where *pij* demonstrates the connection between adjacent PSCs *pi* and *pj*. *N* is the number of interferograms. The term Δ*ϕs*.*<sup>k</sup>* is the double difference interferometric phase in image pairs *s* and *k*, while *Bt*.*<sup>s</sup>* and *Bn*.*<sup>s</sup>* are the temporal and interferometric normal baselines, respectively. *θ* refers to the incidence angle of the SAR signal.

For each PSC, the average residual phase after correction for the modeled parameters is taken to obtain an estimate for the atmospheric signal in the master image. Then, the atmospheric signal of the slave images as well as phase noise is separated from un-modeled deformation based on high-pass filtering. After APS removal, to increase point density, the second set of PSs is selected using a higher threshold for the ADI criterion. The unknown parameters are re-estimated for all pixels based on another maximization of the periodogram [49]. Eventually, temporal coherence, which is a function of residual phase noise, is used to determine the final PSs which build the land deformation map.

#### *3.2. Adaptive Neuro-Fuzzy Inference System (ANFIS)*

Fuzzy inference systems (FIS) are capable of depicting multifaceted processes using the concepts and if-then rules. However, they are incapable of learning [52]. Furthermore, if the number of input variables is too large, then selecting the membership functions and setting the fuzzy rules will become challenging [53]. On the other hand, learning algorithms automatically choose the suitable set of parameters for fuzzy membership functions despite

their inability to explain the system under study. Thus, the ANFIS model [54] is a mixture of ANN and FL benefiting from both ANN's computation capability and FL's decision making. The structure of the ANFIS model contains five layers, called adaptive and fixed [55]. The ANFIS model employs the Takagi–Sugeno–Kang fuzzy algorithm in two rules of 'if-then' with two inputs, *x* and *y*, and one output *f* for both as follows [56]:

$$\begin{array}{l}\text{Rule 1}: if \,\text{x is } A\_1 \text{ and } y \text{ is } B\_{1\prime} \\\text{then } f\_1 = p\_1 \mathbf{x} + q\_1 \mathbf{y} + r\_1 \end{array} \tag{3}$$

$$\begin{array}{l}\text{Rule 2}: if \text{ x is } A\_2 \text{ and } y \text{ is } B\_2, \\\text{then } f\_2 = p\_2 \text{x} + q\_2 y + r\_2 \end{array} \tag{4}$$

Each node contains adaptive nodes, and input variables are fuzzified in first layer (Equations (5) and (6)):

$$O\_{1,i} = \mu A\_{i(x)}\tag{5}$$

$$O\_{1,i} = \mu B\_{i(y)} \tag{6}$$

where, *x* and *y* are the input nodes, *A* and *B* are the linguistic variables, and μ*Ai*(*x*) and μ*Bi*(*y*) are membership functions for that node. The second layer contains fixed nodes denoted as ୡ to compute the strength of the rules. The output of each node is the product of all input signals to that node (Equation (7)):

$$O\_{2,i} = \mathcal{W}\_i = \mu A\_{i(x)} \mu B\_{i(y)}, \ i = 1, \ 2 \tag{7}$$

where *Wi* is the output for each node.

The third layer encompasses fixed nodes denoted as N. The nodes in this layer are the normalized outputs of the second layer, which are referred to as the normal firepower (Equation (8)):

$$O\_{3,i} = w\_i = \frac{w\_i}{\sum\_{j=1}^{2} w\_j}, \; i = 1, \; 2 \tag{8}$$

All nodes in the fourth layer are adaptive and associated with a node function described by the following equation:

$$O\_{4,i} = w\_i f\_i = w\_i (p\_i \mathbf{x} + q\_i \mathbf{x} + r\_i) \tag{9}$$

where *wi* is the normalized firepower of third layer and *pi*, *qi*, and *ri* are node parameters. The parameters of this layer are can be interpreted as the result parameters.

The final layer has only one node denoted as Σ, which represents the summation of all the input signals:

$$O\_{5,i} = \sum w\_i f\_i = \frac{\sum w\_i f\_i}{\sum w\_i} \tag{10}$$
