*Appendix A.2. Superposition with Weight Calculation*

Superposition is a weighted averaging process that filters the receiver array signal in the spatial domain. In this paper, instead of preset and constant weighting coefficients, the adaptive weighting method was applied to obtain weights that cause different receivers to have different influences on the imaging result.

In signal superposition, we suppose that the number of receivers is *M* and that the signal of the receiver array is *x*(*t*), which includes the signal from target point *s*(*t*) as well as noise *i*(*t*) and interference *v*(*t*):

$$x(t) = s(t)a + i(t) + v(t) \tag{A2}$$

where *a* is the direction vector of the beam and *t* is the time sequence of the signal. The beamformer *b*(*t*) is defined as:

$$b(t) = w^H y(t) = \sum\_{m=1}^{M} w\_m \, ^\*x\_m(t - \tau\_m) \tag{A3}$$

where *w* is the beamforming weight vector, *H* represents the conjugate transpose, and *wm*<sup>∗</sup> represents a conjugate matrix of *wm*.

$$y(t) = \begin{bmatrix} \mathbf{x}\_1(t - \tau\_1) & \mathbf{x}\_2(t - \tau\_2) \ \dots & \mathbf{x}\_3(t - \tau\_3) \end{bmatrix} \tag{A4}$$

is the signal of each receiver after a time delay. *τ<sup>m</sup>* is the delay value of each receiver. The signal is similar to a vertically incident plane wave at each receiver after time delay processing. In this case, the direction vector *a* is a vector composed entirely of 1:

$$a = \begin{bmatrix} 1 \ 1 \ \dots \ 1 \end{bmatrix}^{\mathrm{T}} \tag{A5}$$

where *T* represents matrix transposition. When *w* changes with the received signal to improve the image quality, *b*(*t*) becomes an adaptive beamformer. The weight vector *w* (Capon, 1969) [21] can be found from the maximum of the signal-to-interference-plus-noise ratio (SINR) [33]:

$$\text{SINR} = \frac{\sigma\_s^2 \left| w^H a \right|^2}{w^H R\_{i+n} w} \tag{A6}$$

where *σ*<sup>2</sup> *<sup>s</sup>* is the signal power, *Ri*+*n*(*t*) is the interference-plus-noise covariance, and *a* is the direction vector in Equation (A5). The Maximizing Equation (A6) is equivalent to minimizing the output interference-plus-noise power while maintaining a distortionless response to the desired signal. In practice, the exact interference-plus-noise *Ri*+*n*(*t*) is unavailable, so the sample covariance matrix *R*(*t*) is used in N receivers:

$$R(t) = \frac{1}{N} \sum\_{n=0}^{N} y(t) y^{H}(t) \tag{A7}$$

Then, this issue can be equivalent to:

$$\min\_{w} w^H R w \text{ \textquotedblleft Row\textquotedblright to } w^H a = 1 \tag{A8}$$

By using the Lagrange multiplier method, the analytic formula is:

$$w\_{\rm MV} = \frac{R^{-1}a}{a^H R^{-1} a} \tag{A9}$$

where *wMV* is the MV beamforming weight vector and *a* is a known vector according to Equation (A5). The next step is to obtain the estimated covariance matrix *R* of the signal data. The spatial smoothing method is used to obtain a good estimation. The receiver array is divided into several overlapping subarrays, and the covariance matrix of these subarrays is averaged to obtain a robust covariance matrix estimation. This is also defined as spatial smoothing processing [22]:

$$\hat{R}(\mathbf{t}) = \frac{1}{M - L + 1} \sum\_{l=0}^{M-L} \overline{y\_l}(t) \overline{y\_l}^H(t) \tag{A10}$$

where *M* is the total number of receivers and *L* is the length of the subarray. As shown in Figure A2, there are *M* receivers in total, and they are divided into *M* − *L* + 1 subarrays. Each of the subarrays is *L* receivers long. From left to right, subarray No. 1 to subarray No. *M* − *L* + 1 covers all receivers.

**Figure A2.** Illustration of subarrays.

*yl*(*t*) is the delayed signal of No. *l* subarray:

$$
\overline{y\_l}(t) = \begin{bmatrix} y\_l(t) \\ y\_{l+1}(t) \\ \vdots \\ y\_{l+L-1}(t) \end{bmatrix} \tag{A11}
$$

The size of each subarray must satisfy *L* ≤ *M*/2 (Asl, 2009 [26]) to ensure that the estimated covariance matrix is invertible (full rank). However, reducing *L* increases the robustness but reduces the resolution, and setting *L* = 1 makes the beamformer a DAS beamformer with uniform weights. Therefore, there is a trade-off between the robustness and resolution. The critical value *L* = *M*/2 is taken to achieve the highest possible imaging resolution.

Diagonal loading (DL) [34] is adopted to improve the robustness of the algorithm by adding spatial white noise to the recorded wavefield and to adjust the weight calculation [25] using:

$$\mathcal{R}\_{dl} = \mathcal{R}(t) + \delta \cdot \text{trace}\{\mathcal{R}(t)\} \cdot \mathbf{I} \tag{A12}$$

to replace *R*ˆ(*t*), where *δ* represents the DL factor and I represents the identity matrix. The DL factor should satisfy *δ* 1/*L*. Then, we replace *R* in (8) with *Rdl*, and the weight of the beamformer becomes:

$$w\_{MV} = \frac{R\_{dl}^{\quad} {}^{-1}a}{a^H R\_{dl}^{\quad} {}^{-1}a} \tag{A13}$$

The beamformer can be written as:

$$b(t) = \frac{1}{M - L + 1} \sum\_{l=0}^{M-L} w\_{MV}^H \overline{y\_l}(t) \tag{A14}$$

A weighting factor modified from the CF map is presented as:

$$\text{CF}(t) = \frac{\left| \sum\_{l=0}^{M-1} y\_l(t) \right|^2}{M \sum\_{l=0}^{M-1} \left| y\_l(t) \right|^2} \tag{A15}$$

CF ranges from 0 to 1: when CF(*t*) = 1, it is a perfect focused point at time *t*, and when CF(*t*) = 0, the signals of different receivers are not coherent. After multiplication by the amplitude beamformer *b*(*t*), the final image of the Nth shot *It*(*N*) is obtained as:

$$I\_t(N) = b(t) \cdot \mathbb{C}F(t) \tag{A16}$$
