**2. Materials and Methods**

*2.1. Materials*

#### 2.1.1. GNSS Datasets

This study utilized GNSS vertical deformation sequences provided by CMONOC, and the distribution of the GNSS stations is shown in Figure 1b. The period of each station is not consistent due to the difference in the station establishment time, and the periods of GNSS arrays are shown in Figure 1a. The study period was chosen as 2011–2020 to ensure the completeness of vertical deformation sequences. There were 263 original GNSS stations after removing 6 stations with large period differences, which are shown by the red shadow in Figure 1a. The GNSS observation sequences were calculated using observation, navigation, precision ephemeris, and table files. Furthermore, the daily coordinate solution file was obtained based on GAMIT/GLOBK 10.4, and its specific solution strategy is shown in Table 1 [42]. The GNSS vertical sequences were preprocessed by removing observed outliers that were three times larger than the standard error and system sequence errors caused by earthquakes or antenna replacement.

**Figure 1.** Distribution of continuous observation stations over mainland China. (**a**) The period of each GNSS station. (**b**) The distribution map of GNSS stations.


**Table 1.** Table of GNSS data resolution strategies based on GAMIT/GLOBK.

#### 2.1.2. GRACE Datasets

The primary mission of GRACE satellites is to monitor spatiotemporal variations in the Earth's gravity field on a global scale. Specifically, the gravity field anomaly is not only related to the Earth's rotation but also affected by geophysical phenomena, such as earthquakes, glacial equilibrium adjustments, and oceanic and hydrological changes [29,43]. The gravity variations in GRACE inversions are generally attributed to the large-scale hydrological migration in mainland China. To verify the reliability of DWLIM, this study employed GRACE Mascon (GRACE-M) to compare its results with the DWLIM outcomes. However, the difference in solution strategies causes considerable uncertainty in the single GRACE-M solution. This study utilized the GRACE-M products obtained from 2011 to 2020 provided by the Center for Space Research (CSR) and the Jet Propulsion Laboratory (JPL) of NASA. The TWSA datasets in mainland China were extracted from the boundary files, and the mean value of the two products was considered the final GRACE-M result. Moreover, we did not add additional smoothing, empirical destriping, filtering, or a scaling factor. To compare DWLIM and GLDAS, the mean datasets of GRACE-M were corrected by first-order terms.

$$
\Delta T W S A\_{\text{GRACE-M}} = \frac{\Delta Mascon\_{\text{CSR}} + \Delta Mascon\_{\text{JPL}}}{2} \tag{1}
$$

#### 2.1.3. Auxiliary Datasets

GLDAS V2.2 is an evolution of the earlier Catchment Land Surface Model (CLSM) with 24 variables, including temperature and TWSA. The spatial coverage of daily GLDAS ranges from 60◦ S to 90◦ N in latitude and 180◦ W to 180◦ E in longitude [44]. For the construction and validation of DWLIM, this study used the temperature variables from the GLDAS V2.2 model as the input data for LSTM regression. The TWSA variables from the GLDAS V2.2 model can be regarded as validation data for DWLIM-derived outcomes. In addition, surface pressure sequences from ERA 5 datasets were provided by the European Centre for Medium-Range Weather Forecasts (ECMWF). The spatial resolution of ERA 5 is 0.1◦ × 0.1◦ with global spatial coverage, and the period of time is from 2000 to the present [45].

#### *2.2. Methods*

#### 2.2.1. LSTM Algorithm

LSTM is an improved recursive neural network (RNN) model proposed by Hochreter et al. in 1997 [46]. The LSTM model is trained by constructing memory storage units and using a temporal backpropagation algorithm. This algorithm can solve the problem of gradient disappearance in RNN, and it has no long-term dependence. The standard LSTM model mainly consists of the following: each step *t* with its corresponding input sequence *X*: *x*1, *x*2, ... , *xt*, the input gate *it*, the forget gate *ft*, and the output gate *ot.* The memory unit *ct*

can control the memory and forget the data through different gates, and it is calculated as follows [46].

$$f\_t = \sigma(\mathcal{W}\_f \mathbf{x}\_t + \mathcal{U}\_f h\_t) \tag{2}$$

$$o\_t = \sigma(\mathsf{W}\_o \mathfrak{x}\_t + \mathsf{U}\_o \mathfrak{h}\_t) \tag{3}$$

$$\overline{\mathcal{E}}\_t = \tanh(\mathcal{W}\_c \mathfrak{x}\_t + \mathcal{U}\_c \mathfrak{h}\_t) \tag{4}$$

The memory unit *c j <sup>t</sup>* of the *j* LSTM with unit time *t* can be expressed as follows [46].

$$c\_t^j = i\_t^j \times \overline{c}\_t + f\_t^j \times c\_{t-1}^j \tag{5}$$

When the memory cell is updated, the current hidden layer *h j <sup>t</sup>* can be calculated [46].

$$h\_t^j = o\_t^j \times \tanh(c\_t^j) \tag{6}$$

where *W* denotes the weight matrix of the input process; *U* denotes the state transfer weight matrix, which is an S-shaped function; tanh denotes the hyperbolic tangent function; *σ* denotes the sigmoid function; *ht* denotes the hidden state vector of the output; and *<sup>c</sup>*(*<sup>t</sup>* denotes the new matrix after updating. The three types of gates jointly control the information entering and leaving the memory cell. The input gate regulates the new information entering the memory cell. The forget gate controls how much information is kept in the memory cell, and the output gate defines how much information can be output. The gate structure of LSTM causes the information in the time series to form a balanced long- and short-term dependence for multiple regression purposes.

The original sequences were decomposed into *n* feature signals based on MEEMD due to the few input sequences in this study. The decomposition process is described as follows [47].

$$F = IMF\_1 + IMF\_2 + \dots \, + \, IMF\_n + \, noi\_w \tag{7}$$

where *F* denotes the original feature sequence, *IMF1–IMFn* denote the *n* modal components obtained by decomposing the original sequence, and *noiw* denotes the Gaussian white noise added by MEEMD to be decomposed in the decomposition process.

The geophysical parameters show similar characteristics over a small-scale region. There is a homologous amplitude of crustal deformation where the grid is adjacent to the GNSS station. Therefore, the distance between the grid and GNSS station is considered by using the algorithm of inverse distance weight. The application of inverse distance weight contains three steps. Firstly, the figure center of the grid is regarded as the location coordinates for calculating the distance. Secondly, the distance between the simulated grid and the control GNSS stations is calculated. Finally, we assign the weight to each simulated sequence based on the algorithm of inverse distance weight. The simulated formula is as follows.

$$D\_{\mathcal{S}} = \sum\_{j=1}^{n} \frac{\frac{1}{d\_j}}{\sum\_{i=1}^{n} \frac{1}{d\_i}} \left( \text{Net}\_{LSTM}^j(IMF\_{1\prime}IMF\_{2\prime}, \dots, IMF\_m) \right) \tag{8}$$

where *Dg* denotes the simulated crustal deformation in the unobserved grid by DWLIM; *dj* represents the distance between the center of the grid and the control station; *<sup>n</sup>* ∑ *i*=1 1 *di* denotes the reciprocal sum of the distances between the grid and each control station; *n* denotes the number of the control GNSS stations; *IMF1–IMFm* denote the *m* modal feature components obtained by MEEMD; and *Netjth LSTM* denotes the *j* LSTM regression network. Thus, the simulated crustal vertical deformation of each grid is regressed *n* times and weighted according to the inverse distance weight.

#### 2.2.2. The Crustal Load-Deformation Model

The upper part of the continental crust can be considered an elastic layer; it will cause the elastic response of the surface to settle or rebound when the mass of the Earth's surface changes. This deformation is also called crustal load-deformation. Crustal loaddeformation occurs not only in the vertical direction but also in the horizontal direction. Crustal load-deformation is more sensitive in the vertical direction than that in the horizontal direction. The relationship between crustal loading and crustal load-deformation can be established by the Green function [48], which is calculated as follows.

$$\begin{cases} \begin{aligned} \mathcal{U}\_{\text{green}} &= 2\pi \sum\_{n=0}^{\infty} h\_{\text{\tiny n}} \times \left[ P\_{n-1} (\cos \theta) - P\_{n+1} (\cos \theta) \right] \times \frac{\overline{\text{GR}}}{\overline{\text{g}} (2\pi + 1)} \times P\_{\text{\tiny n}} (\cos \theta), (n > 0) \\\ \mathcal{U}\_{\text{green}} &= 2\pi \sum\_{n=0}^{\infty} h\_{\text{\tiny n}} \times (1 + \cos \theta) \times \frac{\overline{\text{GR}}}{\overline{\text{g}} (2\pi + 1)} \times P\_{\text{\tiny n}} (\cos \theta), (n = 0) \end{aligned} \end{cases} \tag{9}$$

where *θ* denotes the angular radius from the center of the disk; *Pn* denotes the Legendre polynomials; *G* denotes Newton universal gravitational constant, which is equal to 6.67 × <sup>10</sup>−<sup>11</sup> <sup>N</sup> × <sup>m</sup>2/kg2; *<sup>R</sup>* denotes the radius of the Earth; *hn* denotes the loading Love number; and *g* denotes the acceleration of gravity.

DWLIM utilizes hydrological deformation as the input data, and it combines the crustal loading inversion model to obtain the TWSA in the study region. In the crustal loading model, the obtained solutions are regularized using a curvature smoothing algorithm, and the solutions are added as constraints in the solution matrix. In other words, the least-squares problem is minimized to estimate the daily terrestrial water storage variability for each segment of time studied [49].

$$\left(\left(\mathcal{U}\_{\mathcal{G}ren}\boldsymbol{\pi}-\boldsymbol{d}\right)/\sigma\right)^{2}+\beta^{2}\left(\boldsymbol{L}(\boldsymbol{\pi})\right)^{2}\to\min\tag{10}$$

where *Ugreen* denotes the coefficient matrix of the Green function obtained by Equation (9); *σ* denotes the standard deviation of the hydrological load-deformation sequence; *d* denotes the hydrological load-deformation time series, including the simulated *Ugrid* and *UGNSS*; *L* denotes the Laplace operator; and *β* denotes the smoothing factor.

#### 2.2.3. Construction of DWLIM

Broadly speaking, the hydrology and atmosphere on the surface exert stress on the continental crust. At the same time, the crust will produce corresponding elastic deformation when the stress is less than the elasticity of the crustal rocks [50,51]. Fortunately, GNSS can accurately observe crustal deformation with submillimeter accuracy [52]. In recent years, the crustal load-deformation model has been employed to invert the local TWSA in regions where GNSS stations are densely distributed [53–55]. However, the distribution of GNSS stations is uneven worldwide due to the influence of geographic conditions [12]. Sparsely distributed GNSS arrays cannot be used to accurately invert TWSA because of the limitation of the disk expansion radius. Therefore, it is one of the keys for accurately deriving TWSA to accurately simulate surface load-deformation in the unobserved regions. In this study, DWLIM was constructed by combining LSTM, the inverse distance weighting method, and the crustal load model. The specific process of DWLIM can be divided into the following five steps.


(h-files), and series outliers and step terms that are three times larger than the standard deviation are removed.


**Figure 2.** The flow chart of DWLIM.

#### *2.3. Evaluation Index*

In this study, the root mean square error (*RMSE)*, Nash–Sutcliffe efficiency (*NSE*), and Pearson's correlation coefficient (*PCC*) were utilized to evaluate the accuracy of DWLIM results [57–59], as follows.

$$RMSE = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} (Y\_i - X\_i)^2} \tag{11}$$

$$NSE = 1 - \frac{\sum\_{i=1}^{n} \left(\mathbf{Y}\_{i} - \mathbf{X}\_{i}\right)^{2}}{\sum\_{i=1}^{n} \left(\mathbf{X}\_{i} - \overline{\mathbf{X}}\right)^{2}} \tag{12}$$

$$PCC = \frac{\sum\_{i=1}^{n} \left(X\_i - \overline{X}\right) \left(Y\_i - \overline{Y}\right)}{\sqrt{\sum\_{i=1}^{n} \left(X\_i - \overline{X}\right)^2} \sqrt{\sum\_{i=1}^{n} \left(Y\_i - \overline{Y}\right)^2}}\tag{13}$$

where *Y* and *X* denote accurate and simulated data, respectively, and *Y* and *X* represent the mean value of data. The *RMSE* can be employed to evaluate the deviation of the inversion results from the actual values. The smaller the value of *RMSE*, the better the simulation accuracy. The *NSE* is mainly used to evaluate the performance of the hydrological model, and its value is not larger than 1. The larger the value, the better the hydrological model. When *NSE* is close to 0, it indicates that the effect of the hydrological model agrees with the

average of observed values. The *PCC* is mainly employed to describe the linear correlation between two sequences. The *PCC* value is between −1 and 1. If the *PCC* value is closer to 1, the inversion result is more reliable.

### **3. Results**
