*4.1. Experimental Setup*

## 4.1.1. Simulated Dataset

We used 5 cadaver CT scans from Massachusetts General Hospital [47] for simulating PAT dataset. These 5 cadavers were scanned on a GE Discovery 750 HD scanner under 120 kVp x-ray spectra. Noise index (NI) was used by GE to define the image quality, which is approximately equal to the standard deviation of the CT number in the central region of the image of a uniform phantom [48]. In this study, we used a noise index of 10, which represents a normal-dose scanning. Then, the images were reconstructed by the GE commercial iterative reconstruction algorithm, called adaptive statistical iterative reconstruction (ASIR-50%). To reduce the computational cost for simulating the data and training the network, we extracted image patches of size 128 × 128 from CT scans. More specifically, 64,000 image patches were randomly selected from 3 scans for training purpose. Then, 6400 image patches were randomly selected from 1 scan for validation. Finally, 200 image patches were randomly selected from 1 scan for testing the trained model. Please note that the patients for training, validation, and testing sets were randomly selected from 5 patients CT scans without replacement. Since the Hounsfield unit (HU) used in CT imaging ranges from −1000 to ∼ +2000, we are interested in the complex tissue structure within [−160, 240] HU window for our PAT simulation. Thus, with this selected HU window, image patches were first normalized into [0, 1] serving as the initial source for PAT imaging, and the output of the first iteration of ATR serves as the input to the network.

## 4.1.2. Baseline Method

We compare the proposed network-based reconstruction with two state-of-the-art iterative algorithms in correcting reflection artifacts. These iterative algorithms, known as the ATR and adjoint method respectively, are designed to remove the reflection artifacts from the mathematical point of view. The reasons for choosing these two methods are three-fold. Firstly, in contrast to the universal back-projection formula [5] which is mathematically valid only in homogeneous media, these methods can be applied to general heterogeneous media. The universal back-project formula is a special case of the TR method if one sets the sound speed to be constant and applies the Kirchhoff solution formula for the 3D acoustic wave equation. Secondly, the TR and adjoint methods are applicable with arbitrary closed surfaces of sensor arrays. In a typical TR or adjoint process, the measured ultrasound signals are re-transmitted in a temporally reversed order back to the tissue. This can be numerically simulated by solving the wave propagation model backwards to the initial moment while using the measured ultrasound signal as the boundary condition, regardless of the geometry of sensor arrays. Thirdly, TR and adjoint methods are popular in both research and application, for instance [8,9,49,50] for various TR methods and [51–53] for various adjoint methods. Their implementation has also been included in open source packages such as the k-wave MATLAB package [54]. A sketchy introduction to these iterative algorithms is included in the Appendix A for the convenience of the readers.

## 4.1.3. Parameter Setting

For our proposed network, the initial learning rate *λ* was set as 1.0 × 10−<sup>3</sup> and was adjusted by every epoch, namely *λt* = √*λt* at the *t*-th epoch. For the Adam optimization, the coefficients used for computing running averages of gradients and its square were set as 0.9 and 0.999, respectively. The network was implemented with PyTorch DL library [55] and trained within 60 epochs using four NVIDIA GeForce GTX 1080 Ti GPUs. The batch size was set as 512 during the training.

For these two iterative algorithms, the number of iterations for ATR and Landweber is empirically set to be 10. The regularization parameter in Landweber is empirically chosen as 0.03.

## 4.1.4. Evaluation Metrics

For the evaluation of image quality, we used the peak signal-to-noise ratio (PSNR) and SSIM in our experiments. The SSIM has been defined in (5) and PSNR is defined as follows:

$$\text{PSNR}(I\_{\text{est}}, I\_{\text{IS}}) = 10 \log\_{10} \left( \frac{R^2}{\text{MSE}(I\_{\text{est}}, I\_{\text{IS}})} \right), \tag{7}$$

where *R* is the maximum possible pixel value of the images, which is 1 in this study as the images are normalized into [0, 1].
