**Abbreviations**

The following abbreviations are used in this manuscript:



#### **Appendix A. Artifacts Correction by Iterative Algorithms**

In this appendix, we describe the principles of the averaged time reversal and the adjoint method in more details. Interested readers are referred to [8,14] for the accurate and complete exhibition.

#### *Appendix A.1. Averaged Time Reversal*

The averaged time reversal method is proposed in [12] as a remedy to the conventional TR to remove the reflection artifacts in the latter. The essential improvement comes from the introduction of an averaging process to the measured data before reversing the time. The rational, roughly speaking, is that artifacts generated by acoustic reflections have either positive amplitude or negative amplitude, depending on the number of times they touch the boundary. A suitable averaging process along the time direction is thus able to annihilate reflected artifacts with opposite signs. The process can be viewed as a pre-conditioning to the TR. To be a bit more precise, let **Λ** be the linear operator

$$\mathbf{A}: \qquad p\_0 \mapsto \boldsymbol{\mu}|\_{[0,T]\times\partial\Omega} \tag{A1}$$

where *u* is the solution of the forward model (1). In other words, **Λ** sends the initial pressure to the measured data. This is a linear operator that is completed determined by the form of the partial differential equation in (1). If the forward model is discretized, **Λ** would just be a matrix. Reconstructing the initial pressure from the ultrasound signals amounts to inverting the matrix **Λ**. Direct inversion is normally impossible due to the large dimensionality of this matrix. Instead, it is shown in [12] that one can introduce an averaged time reversal process *A* such that the initial pressure *p*0 can be reconstructed iteratively from the measured data *<sup>u</sup>*|[0,*<sup>T</sup>*]×*∂*<sup>Ω</sup> by the relation

$$p\_0 = \sum\_{k=0}^{\infty} (\mathbf{A}\mathbf{A} - Id)^k \mathbf{A}(u|\_{[0,T]\times\partial\Omega}) \tag{A2}$$

where *Id* is the identity operator (or identity matrix if the forward model is discretized). This algorithm is known as the averaged time reversal (ATR). It is mathematically proved in [12] that ATR can correct the reflection artifacts caused by conventional time reversal methods. It is one of the baseline methods we used in the paper to compare with the DL reconstruction.

#### *Appendix A.2. Landweber Iteration*

The Landweber iteration, also known as the Landweber algorithm, is an algorithm proposed in the 1950s by Landweber [56] to solve ill-posed linear systems of the form **Λ***x* = *b* where **Λ** is a (not necessarily square) matrix. It is a regularization method that can be viewed as iteratively solving the unconstrained optimization problem

$$\min\_{\mathbf{x}} \frac{1}{2} \|\mathbf{A}\mathbf{x} - \mathbf{b}\|\_{2'}^2 \tag{A3}$$

which leads to the update scheme

$$\mathbf{x}\_{k+1} = \mathbf{x}\_k - \gamma \mathbf{A}^\*(\mathbf{A}\mathbf{x}\_k - \mathbf{b}), \qquad k = 0, 1, 2, \dots \tag{A4}$$

where **Λ**<sup>∗</sup> is the adjoint operator of **Λ** and *γ* is a regularization parameter. It is well known the Landweber iteration is convergen<sup>t</sup> if 0 < *γ* < 2 *σ*<sup>2</sup> 1 where *σ*1 is the largest singular value of **Λ**, and it converges to the projection of the true solution on the orthogonal complement of the null space of **Λ**, see the analysis in [14] for instance. In our case, the vector *x* is the discretized version of *p*0, and the vector *b* is the discretized version of the measured data *<sup>u</sup>*|[0,*<sup>T</sup>*]×*∂*Ω. The iteration exhibits the effect of semi-convergence: it converges before reaching a certain number of iterations but then diverges once beyond. The optimal value of *γ* and the stopping rule are largely empirical. Some guiding principles exist, but we do not intend to discuss them in this article.
