**1. Introduction**

Photo-Acoustic Tomography (PAT) is an emerging non-invasive modality that has manifested an enormous prospect for some clinical practices [1]. In PAT, the tissue is illuminated with near-infrared light of wavelength 650–900 nm. The absorbed optical energy is transformed into acoustic energy through the photo-acoustic effect, and the generated ultrasound is measured by transducer arrays outside the tissue to retrieve the optical properties of the tissue. The coupling mechanism of optical and ultrasound waves gives multiple advantages over conventional individual imaging modalities. As the acoustic waves experience much less scattering in tissue compared to optical waves, PAT can generate high-resolution images in the presence of strong optical scattering to break the optical diffusion limit [2]. The image can reach submillimeter spatial resolution while preserving intrinsic optical contrast [3].

Typical photo-acoustic signal generation comprises three steps: (1) the tissue absorbs light; (2) the absorbed optical energy causes a temperature rise; (3) thermo-elastic expansion occurs and generates ultrasound. The image formation in PAT is to recover the distribution of the deposited energy, known as the local optical fluence, from the ultrasound signals that are recorded by the sensors deployed around the tissue. As the initial ultrasound pressure is approximately proportional to the optical fluence, it is sufficient to reconstruct the initial pressure from the recorded ultrasound signals.

The quality of PAT images relies on multiple factors, and one of them is the acoustic reflection. The reflection can be caused by either hyperechoic structures or special setting for making a measurement such as in reverberant-field PAT [4]. Conventional PAT reconstruction algorithms, such as the universal back-projection formula [5] and time-reversal-based reconstruction [6–10], are based on the spherical mean radon transform on canonical geometries and not designed to

take the acoustic reflections into account. As a result, the reflected signals are projected along with the real signal back into the image domain, resulting in artifacts that are indistinguishable from the real biological structures. Clinical practitioners who rely on the misleading artifacts to make judgment could come to erroneous conclusions. It is, therefore, essential and of practical significance to design new methods to eliminate the impact of such acoustic reflection in PAT image reconstruction.

Some iterative algorithms were developed to correct the effect of the acoustic reflection. Examples include Fourier analysis [11], averaged time reversal [12], dissipative time reversal [13], and adjoint method [14]. Nonetheless, each of them has its limitation: the Fourier approach applies only to regular domains; averaged/dissipative time reversals, and Landweber iterations can generate high-quality images but time-consuming.

The purpose of this paper is to accelerate and improve the iterative algorithms by incorporating a deep neural network (DNN) structure to reduce the reflection artifacts efficiently and effectively. The entire network contains three parts—feature extraction, artifacts reduction, and reconstruction. A modified version of the U-net with large convolutional filter sizes is used as the backbone of the artifact reduction part to capture the global features by increasing the size of the receptive field. The network is applied to take the place of iterations to accelerate the correction of reflection artifacts. The learning process can simultaneously reduce the image noise as well.

With the rapid increment of computational power, DNNs and deep learning techniques have received considerable attention in recent years for tomographic image reconstruction [15–17]. In particular, they have achieved the state-of-the-art performance in PAT image reconstruction in the scenarios of sparse data [18–20], limited view [21–23], artifacts removal [24–26], as well as other applications [27,28]. It is worth pointing out that the reference [25] considers another type of reflection artifact with different focuses and approaches. The reflection in [25] is caused by point-like targets inside the tissue, while ours by planar reflectors outside the tissue. The network in [25] is mostly based on convolutional neural networks (CNNs) while ours on the U-net structure. The two networks target different problems with associated unique networks and therefore are of independent values. We remark that besides in PAT, DNNs have been successfully applied to other imaging modalities as well, see [29–35] for its applications in CT.

## **2. Data Generation**

This section presents the forward model of PAT and the reflection artifacts induced by the acoustic reflection of signals.

#### *2.1. The Forward Model*

We describe the forward model, which is used to generate ultrasound signals for training purpose. Sound hard reflectors, which can bounce the incident ultrasound back with the opposite speed without absorbing any energy, are placed around the tissue to simulate the acoustic reflections. As such reflectors do not dissipate energy, the resultant reflection artifacts are the strongest, hence can be used as a benchmark to test the artifact-correction capability of different algorithms.

Denote by Ω the biological tissue, which is illuminated by rapid pulses of laser, and afterwards, emits the ultrasound. The boundary of the tissue is represented using the conventional notation *∂*Ω. The forward ultrasound propagation model, in the presence of sound hard reflectors, reads in 2D as follows:

$$\begin{cases} \left(\partial\_t^2 - c^2(r)\Delta\right)p &=& 0 \qquad \text{in } (0,T) \times \Omega, \\ \qquad p|\_{t=0} &=& p\_0(r), \\ \left.\partial\_t p\right|\_{t=0} &=& 0, \\ \left.\partial\_r p\right|\_{[0,T]\times\partial\Omega} &=& 0, \end{cases} \tag{1}$$

where Δ = *∂*2 *∂x*<sup>2</sup> + *∂*2 *<sup>∂</sup>y*<sup>2</sup> is the Laplace operator, *T* is the stoppage time, *p*0(*r*) is the initial ultrasound pressure, *p*(*<sup>t</sup>*,*<sup>r</sup>*) is the ultrasound pressure of the spatial location *r* at time *t*, *c*(*r*) is the wave speed, and *∂ν* is the normal boundary derivative. The main difference of this model with the conventional photo-acoustic models (see, for instance [2,36]) is the boundary condition *∂ν <sup>p</sup>*|[0,*<sup>T</sup>*]×*∂*<sup>Ω</sup> = 0, which is the standard mathematical model to describe sound hard reflectors. Please note that the Neumann boundary condition corresponds to the sample/water interface, and the Dirichlet boundary condition, which corresponds to the sample/air interface, can be handled similarly. We remark that modeling PAT using the Neumann and Dirichlet boundary conditions was reported in the literature [37,38]. The ultrasound is recorded at the boundary of Ω using 508 sensors deployed evenly around the tissue. This measurement amounts to the temporal boundary value of *p*, i.e., *<sup>p</sup>*|[0,*<sup>T</sup>*]×*∂*Ω. The image formation in PAT aims to recover the distribution of the deposited energy that is the local optical fluence. As the initial ultrasound pressure *p*0(*r*) is proportional to the optical fluence, it remains to reconstruct the initial ultrasound pressure *p*0(*r*) from the recorded ultrasound signals.

We simulate numerous ultrasound signals to train the neural network. This is accomplished by solving the forward model (1) using the second order finite difference time domain (FDTD) method with the central difference formula. The simulation is implemented using the MATLAB code written by the authors. The computational domain is a 128 × 128 grid with uniform spacing. The time step is chosen according to the spatial step to fulfill the Courant–Friedrichs–Lewy (CFL) condition. The sound speed *c*(*r*) is either a constant or a spatially varying function, with the former modeling the propagation in homogeneous media and the latter in heterogeneous media. The stoppage time is *T* = 4, which is chosen in such a way that the ultrasound generated from any interior source can reach the boundary.
