**2. Methods**

#### *2.1. General FWI Formulation*

Seismic wave equation can be expressed as

$$\mathbf{Sw} = \mathbf{f},\tag{1}$$

where **w** denotes the subsurface wavefields, **f** indicates the source wavelet, and **S** is the parameter derivative matrix.

By taking the partial derivative of Equation (1) with respect to parameter, the sensitivity kernel **L** can be acquired,

$$\mathbf{L} = \frac{\partial \mathbf{w}}{\partial \mathbf{m}} = -\left(\mathbf{S}^{-1}\right) \frac{\partial \mathbf{S}}{\partial \mathbf{m}} \mathbf{w}.\tag{2}$$

The gradient of misfit function **g** can be obtained by applying the adjoint of sensitivity kernel to the data residuals between observed and synthetic data,

$$\mathbf{g} = \mathbf{L}^T \boldsymbol{\delta} \mathbf{d}.\tag{3}$$

Based on the Newton optimization, the parameter perturbation can be estimated by solving the Newton equation

$$\mathbf{g} = \mathbf{H}\delta\mathbf{m},\tag{4}$$

where **H** denotes the Hessian operator, which is the second-order partial derivatives of the misfit function with respect to parameter.

Thus, the (*k* + 1)*th* iteration model **m**(*k*+<sup>1</sup>) can be updated by summing the (*k*)*th* iteration model **m**(*k*) and the (*k* + 1)*th* iteration model perturbation *δ***m**(*k*) scaled with a suitable step-length *r*,

$$\mathbf{m}^{(k+1)} = \mathbf{m}^{(k)} + r \cdot \delta \mathbf{m}^{(k)}.\tag{5}$$

#### *2.2. Acoustic-Elastic Coupled EFWI Method*

Compared with the standard elastic wave equation, the original AEC equation requires one more formula to compute the pressure component from elastic wavefields. In this study, we have made some modifications to reduce the number of equations and variables, and thus provide a modified AEC equation (details referred to Appendix A), given by

$$\begin{cases} \rho \frac{\partial^2 u\_x}{\partial t^2} - \frac{\partial \left(\tau\_n^s - p\right)}{\partial x} - \frac{\partial \tau\_s^s}{\partial z} = 0\\ \rho \frac{\partial^2 u\_x}{\partial t^2} - \frac{\partial \tau\_s^s}{\partial x} - \frac{\partial \left(-\tau\_n^s - p\right)}{\partial z} = 0\\ p + (\lambda + \mu) \left(\frac{\partial u\_x}{\partial x} + \frac{\partial u\_z}{\partial z}\right) = f\_p, \\ \tau\_n^s - \mu \left(\frac{\partial u\_x}{\partial x} - \frac{\partial u\_z}{\partial z}\right) = 0\\ \tau\_s^s - \mu \left(\frac{\partial u\_x}{\partial z} + \frac{\partial u\_z}{\partial x}\right) = 0 \end{cases} \tag{6}$$

where *p*, *ux*, and *uz* denote the pressure, horizontal , and vertical particle displacement wavefields, respectively. *τsn* and *τss* are the S-wave-related normal and deviatoric stress components, respectively. *λ* and *μ* are the Lamé constants, and *ρ* is density. *fp* indicates the source function applied to the *p*-component. Compared with the original AEC equation, this modified one can generate OBS 4C records with same accuracy in wavefield simulation but costs less computing resources.

In this study, we define a weighted misfit function to account for both pressure and displacement components, given by

$$E\left(\mathbf{m}\right) = \frac{1}{2}\boldsymbol{\varepsilon} \cdot \left\|\delta d\_x\right\|^2 + \frac{1}{2}\boldsymbol{\varepsilon} \cdot \left\|\delta d\_z\right\|^2 + \frac{1}{2}\left(1 - \boldsymbol{\varepsilon}\right) \cdot \mathbb{J} \cdot \left\|\delta d\_p\right\|^2,\tag{7}$$

where *δdx*, *δdz*, and *δdp* denote the residuals of the horizontal, vertical displacement, and pressure components, respectively. Here, *ε* is a weighting coefficient, satisfying *ε* ∈ [0, 1]. A scale factor *ζ* is used to eliminate the differences in magnitude between pressure and displacement components.

According to the adjoint-state theory [28], the adjoint AEC equation can be given by (see Appendix B for details)

$$\begin{cases} \rho \frac{\partial^2 \hat{u}\_x}{\partial t^2} - \frac{\partial \left[ (\lambda + \mu) \, \dot{\mathcal{P}} \right]}{\partial x} + \frac{\partial \left[ \mu \dot{u}\_x^s \right]}{\partial x} + \frac{\partial \left[ \mu \dot{u}\_s^s \right]}{\partial z} = f'\_x\\ \rho \frac{\partial^2 \hat{u}\_z}{\partial t^2} - \frac{\partial \left[ (\lambda + \mu) \, \dot{\mathcal{P}} \right]}{\partial z} - \frac{\partial \left[ \mu \dot{u}\_m^s \right]}{\partial z} + \frac{\partial \left[ \mu \dot{u}\_s^s \right]}{\partial x} = f'\_z\\ \rho - \left( \frac{\partial \dot{u}\_x}{\partial x} + \frac{\partial \dot{u}\_z}{\partial z} \right) = f'\_p\\ \dot{t}\_n^s + \left( \frac{\partial \dot{u}\_x}{\partial x} - \frac{\partial \dot{u}\_z}{\partial z} \right) = 0\\ \dot{t}\_s^s + \left( \frac{\partial \dot{u}\_x}{\partial z} + \frac{\partial \dot{u}\_z}{\partial x} \right) = 0 \end{cases} \tag{8}$$

where (*u*<sup>ˆ</sup>*x*, *<sup>u</sup>*<sup>ˆ</sup>*z*, *p*ˆ, *<sup>τ</sup>*<sup>ˆ</sup>*sn*, *τ*ˆ*ss* ) is the adjoint wavefields of (*ux*, *uz*, *p*, *<sup>τ</sup>sn*, *τss* ), and *f x*, *f z*, *f p*is the multicomponent adjoint source, satisfying

$$\begin{cases} f'\_{\,\,x} = \varepsilon \cdot \delta d\_{\,x} \\ f'\_{\,\,z} = \varepsilon \cdot \delta d\_{\,z} \\ f'\_{\,\,p} = (1-\varepsilon) \cdot \zeta \cdot \delta d\_{\,p} \end{cases} \tag{9}$$

The gradients of the Lamé constants *gλ*, *gμ* and density *gρ*,*Lame* can be computed by performing zero-lag cross-correlations of the adjoint wavefields (Equation (8)) and the forward wavefields (Equation (6)), given by (see Appendix B for details)

$$\begin{cases} \mathbf{g}\_{\lambda} = -\frac{p}{\lambda + \mu} \hat{\boldsymbol{p}} \\ \mathbf{g}\_{\mu} = -\frac{p}{\lambda + \mu} \hat{\boldsymbol{p}} - \frac{\tau\_{\text{n}}^{s}}{\mu} \hat{\mathbf{t}}\_{n}^{s} - \frac{\tau\_{\text{s}}^{s}}{\mu} \hat{\mathbf{t}}\_{\text{s}}^{s} . \end{cases} \tag{10}$$
 
$$\begin{cases} \mathbf{g}\_{\rho, \text{Lame}} = \frac{\hat{\partial}^{2} \boldsymbol{u}\_{\text{x}}}{\partial t^{2}} \hat{\boldsymbol{u}}\_{\text{x}} + \frac{\hat{\partial}^{2} \boldsymbol{u}\_{\text{z}}}{\partial t^{2}} \hat{\boldsymbol{u}}\_{\text{z}} \end{cases} \tag{11}$$

Compared with the Lamé constants, the parameterization of seismic velocities is a better choice in multiparameter EFWI [29–32]. According to the chain rule, the gradients of P- (*α*) and S-wave velocities (*β*) and density can be obtained,

$$\begin{cases} \mathbf{g}\_{\alpha} = 2\rho \mathbf{a} \cdot \mathbf{g}\_{\lambda} \\ \mathbf{g}\_{\beta} = -4\rho \boldsymbol{\beta} \cdot \mathbf{g}\_{\lambda} + 2\rho \boldsymbol{\beta} \cdot \mathbf{g}\_{\mu} \\ \mathbf{g}\_{\rho, Vel} = \left(\boldsymbol{\alpha}^{2} - 2\rho^{2}\right) \cdot \mathbf{g}\_{\lambda} + \boldsymbol{\beta}^{2} \cdot \mathbf{g}\_{\mu} + \mathbf{g}\_{\rho, Lam\varepsilon} \end{cases} \tag{11}$$

The models of P- and S-wave velocities and density can be updated as follows,

$$
\begin{bmatrix} m\_a^{(k+1)} \\ m\_\beta^{(k+1)} \\ m\_\rho^{(k+1)} \end{bmatrix} = \begin{bmatrix} m\_a^{(k)} \\ m\_\beta^{(k)} \\ m\_\rho^{(k)} \end{bmatrix} - \mathbf{r} \cdot \mathbf{H}^{-1} \begin{bmatrix} \mathcal{G}\_a^{(k)} \\ \mathcal{G}\_\beta^{(k)} \\ \mathcal{G}\_\rho^{(k)} \end{bmatrix} \,. \tag{12}
$$

#### *2.3. Preconditioned Truncated Gauss–Newton Algorithm*

The inverse Hessian operator is estimated by a preconditioned truncated Gauss–Newton (PTGN) algorithm, as shown in Algorithm 1. In each iteration, we should perform demigration (**Lp***k*) and

migration (**L***<sup>T</sup>* (**Lp***k*)) processes to update the parameter perturbations. The migration has been illustrated in Equation (8), and the demigration of OBS 4C data can be computed through a first-order Born modeling operator, given by

$$\begin{cases} \rho \frac{\partial^2 \delta u\_x}{\partial t^2} - \frac{\partial \left(\delta \tau^s\_n - \delta p\right)}{\partial x} - \frac{\partial \delta \tau^s\_s}{\partial z} = -\delta \rho \frac{\partial^2 u\_x}{\partial t^2} \\ \rho \frac{\partial^2 \delta u\_z}{\partial t^2} + \frac{\partial \delta \tau^s\_z}{\partial x} - \frac{\partial \left(-\delta \tau^s\_n - \delta p\right)}{\partial z} = -\delta \rho \frac{\partial^2 u\_z}{\partial t^2} \\ \delta p + (\lambda + \mu) \left(\frac{\partial \delta u\_x}{\partial x} + \frac{\partial \delta u\_z}{\partial z}\right) = -(\delta \lambda + \delta \mu) \left(\frac{\partial u\_x}{\partial x} + \frac{\partial u\_z}{\partial z}\right) \\ \delta \tau^s\_n - \mu \left(\frac{\partial \delta u\_x}{\partial x} - \frac{\partial \delta u\_z}{\partial z}\right) = \delta \mu \left(\frac{\partial u\_x}{\partial x} - \frac{\partial u\_z}{\partial z}\right) \\ \delta \tau^s\_s - \mu \left(\frac{\partial \delta u\_x}{\partial z} + \frac{\partial \delta u\_z}{\partial x}\right) = \delta \mu \left(\frac{\partial u\_x}{\partial z} + \frac{\partial u\_z}{\partial x}\right) \end{cases} \tag{13}$$

#### **Algorithm 1** Preconditioned Truncated Gauss–Newton algorithm

**Input:** Gradient **g**, Hessian precondition operator **<sup>H</sup>***p*;

Set **x**(0) = 0 and **r**(0) = **L***T***Lx**(0) − **g**; Solve **<sup>H</sup>***p* · **y**(0) = **r**(0) for **y**(0); Set **p**(0) = −**<sup>r</sup>**(0) and *k* = 0;

**Output:** Parameter perturbation **x**

$$\begin{aligned} &\text{1: } \text{while } \mathbf{r}^{(k)} > \varepsilon \, \mathbf{do} \\ &\text{2: } \quad \mathbf{s}^{(k)} = \frac{\mathbf{r}^{(k)} \mathbf{r}^{(k)}}{\left(\mathbf{L} \mathbf{p}^{(k)}\right)^{\mathrm{T}} (\mathbf{L} \mathbf{p}^{(k)})} \\ &\text{3: } \quad \mathbf{x}^{(k+1)} = \mathbf{x}^{(k)} + \mathbf{s}^{(k)} \mathbf{p}^{(k)} \\ &\text{4: } \quad \mathbf{r}^{(k+1)} = \mathbf{r}^{(k)} + \mathbf{s}^{(k)} [\mathbf{L}^{\mathrm{T}} (\mathbf{L} \mathbf{p}^{(k)})] \\ &\text{5: } \quad \text{Solve } \mathbf{H}\_{p} \cdot \mathbf{y}^{(k+1)} = \mathbf{r}^{(k+1)} \text{ for } \mathbf{y}^{(k+1)} \\ &\text{6: } \quad \mathbf{f}^{(k+1)} = \frac{\mathbf{r}^{(k+1)} \mathbf{r}^{(k+1)}}{\mathbf{r}^{(k)} \mathbf{y}^{(k)}} \\ &\text{7: } \quad \mathbf{p}^{(k+1)} = -\mathbf{y}^{(k+1)} + \mathbf{f}^{(k+1)} \mathbf{p}^{(k)} \\ &\text{8: } \quad k = k+1 \\ &\text{9: } \text{end while} \\ &10: \text{ return } \mathbf{x}^{(k+1)} \end{aligned}$$

A diagonal pseudo-Hessian with source-side illumination is used as a precondition operator, given by

$$\mathbf{H}\_p = \left(\frac{\partial \mathbf{S}}{\partial \mathbf{m}} \mathbf{w}\right)^T \left(\frac{\partial \mathbf{S}}{\partial \mathbf{m}} \mathbf{w}\right). \tag{14}$$

*Remote Sens.* **2020**, *12*, 2816

According to the modified AEC equation, the diagonal blocks of the Hessian for Lamé constants and density satisfy

$$\begin{cases} H\_{\lambda\lambda} = \left(\frac{\partial \mathbf{S}}{\partial \lambda} \mathbf{w}\right)^{\mathrm{T}} \left(\frac{\partial \mathbf{S}}{\partial \lambda} \mathbf{w}\right) = \left(\frac{p}{\lambda + \mu}\right)^{2} \\ H\_{\mu\mu} = \left(\frac{\partial \mathbf{S}}{\partial \mu} \mathbf{w}\right)^{\mathrm{T}} \left(\frac{\partial \mathbf{S}}{\partial \mu} \mathbf{w}\right) = \left(\frac{p}{\lambda + \mu}\right)^{2} + \left(\frac{\tau\_{\mathrm{n}}^{\mathrm{e}}}{\mu}\right)^{2} + \left(\frac{\tau\_{\mathrm{s}}^{\mathrm{e}}}{\mu}\right)^{2}, \\ H\_{\rho\rho} = \left(\frac{\partial \mathbf{S}}{\partial \rho} \mathbf{w}\right)^{\mathrm{T}} \left(\frac{\partial \mathbf{S}}{\partial \rho} \mathbf{w}\right) = \frac{\partial^{2} u\_{\mathrm{x}}}{\partial t^{2}} \frac{\partial^{2} u\_{\mathrm{x}}}{\partial t^{2}} + \frac{\partial^{2} u\_{\mathrm{z}}}{\partial t^{2}} \frac{\partial^{2} u\_{\mathrm{z}}}{\partial t^{2}} \end{cases} \tag{15}$$

and the ones for P- and S-wave velocities and density can be given by

$$
\begin{bmatrix}
H\_{aa} & H\_{a\beta} & H\_{a\rho} \\
H\_{a\beta} & H\_{\beta\delta} & H\_{\beta\rho} \\
H\_{a\rho} & H\_{\beta\rho} & H\_{\rho\rho}
\end{bmatrix} = \\
$$

$$
\begin{bmatrix}
2\rho & 0 & 0 \\
a^2 - 2\beta^2 & \beta^2 & 1
\end{bmatrix} \cdot \begin{bmatrix}
H\_{\lambda\lambda} & H\_{\lambda\mu} & H\_{\lambda\rho} \\
H\_{\lambda\mu} & H\_{\mu\mu} & H\_{\mu\rho} \\
H\_{\lambda\rho} & H\_{\mu\rho} & H\_{\rho\rho}
\end{bmatrix} \cdot \begin{bmatrix}
2\rho & 0 & 0 \\
a^2 - 2\beta^2 & \beta^2 & 1
\end{bmatrix}^T
$$

Thus, we have

$$\begin{cases} H\_{\text{ax}} = 4\alpha^2 \rho^2 H\_{\lambda\lambda} \\\\ H\_{\beta\beta} = 16\beta^2 \rho^2 H\_{\lambda\lambda} + 4\beta^2 \rho^2 H\_{\mu\mu} + H\_{\rho\rho} \\\\ H\_{\rho\rho} = \left(\alpha^2 - 2\beta^2\right)^2 H\_{\lambda\lambda} + \beta^4 H\_{\mu\mu} + H\_{\rho\rho} \end{cases} \tag{17}$$

#### **3. Sensitivity Analysis**

In regional and global seismic explorations, sensitivity kernels are always used to portray subsurface wavepaths [33,34]. For the elastic case, the kernels of displacement components with different parameter classes have been studied on the standard elastic equation [23,35]. In this study, we use the modified AEC equation to simulate elastic wavefields, allowing for the kernels of pressure and displacement components. The experiment is performed on the elastic Marmousi model (see Figure 1), including a shallow water layer below the sea surface. Only one shot is excited to generate a pure P-wave source at the place of (5 km, 0.03 km), and 601 4C receivers are evenly located at the seabed. The peak frequency of the source function is 8 Hz.

Observed pressure and displacement records are back-propagated from the receivers, respectively. The obtained sensitivity kernels of pressure (*p*) and displacement (*ux* and *uz*) components are presented in Figure 2. In the pressure kernels, most energy is distributed in the shallow part of the model and attenuates rapidly with the increase of depth. In contrast, the displacement kernels contain the S-wave reflection paths, thus it can enhance the illumination for the deep model. The corresponding 2D wavenumber spectrum of these kernels are shown in Figure 3. It is clear that the pressure kernels have higher resolution than the displacement ones in both vertical and horizontal directions. It is because the pressure component is computed by the spatial partial derivatives of the displacement wavefield, and these operators physically increase the frequency (or wavenumber) content of the data. Consequently, the simultaneous utilization of pressure and displacement components can provide a better characterization of subsurface structures.

**Figure 1.** True parameters of the elastic Marmousi model: *V p* (**a**), *Vs* (**b**), and *ρ* (**c**).

**Figure 2.** Sensitivity kernels of pressure (top) and displacement (bottom) components with respect to (**<sup>a</sup>**,**d**) *V p*, (**b**,**<sup>e</sup>**) *Vs*, and (**<sup>c</sup>**,**f**) *ρ*.

The kernels for different parameter classes also have remarkable differences. The *V p* kernels (Figure 2a,d) are mainly formed by the diving waves and reflections associated with P-wave, which show relatively isotropic distributions in the wavenumber spectrum. The *Vs* kernels contain more information of PS reflections (Figure 2b,e). Besides, the *ρ* kernels are of the highest wavenumber components and behave as migration images of subsurface interfaces. That is the reason why we can easily obtain the short-wavelength structures of density model but fail to reconstruct the background model from seismic data.

**Figure 3.** The 2D Wavenumber spectrum for the sensitivity kernels of pressure (top) and displacement (bottom) components with respect to (**<sup>a</sup>**,**d**) *V p*, (**b**,**<sup>e</sup>**) *Vs*, and (**<sup>c</sup>**,**f**) *ρ*.
