*2.3. Phantom Design*

The phantoms shown in Figure 1 were constructed in GATE Monte Carlo simulation by using the voxelized source and voxelized phantom to define the activity distribution and photon attenuation, respectively. PHANbrain was the Hoffman 3D brain phantom. Figure 1a–c demonstrate the axial, coronal, and sagittal view of PHANbrain. To increase the dataset size and diversity for CNN training, PHANbrain was slightly modified to generate 20 phantom configurations (2 translations× 2 rotations× 5 deformations), where each of them was filled with activity of 3.7 × 10<sup>6</sup> Bq. PHAN5rod was a cylinder of dimeter 50 mm, length of 80 mm and containing 5 rods with diameters of 2, 4, 6, 8, 10 mm. Figure 1d demonstrates the axial view of PHAN5rod. The target-to-background ratio (TBR) was set at 0, 2, 4, 5, 8, 10, 16, 20 to generate 8 phantom configurations, where the rod inserts within PHAN5rod were filled with activity concentration of 1.69 × 10<sup>6</sup> Bq/mL. PHAN1sphere was a cylinder of diameter 50 mm, length of 80 mm and containing a 10-mm-diameter sphere. Figure 1e demonstrates the axial view of PHAN1sphere. The sphere within PHAN1sphere was filled with water (i.e., cold sphere), while the cylinder was filled with activity concentration of 1.69 × 10<sup>6</sup> Bq/mL. PHAN20rod was an elliptical cylinder of major axis 55 mm, minor axis of 50 mm, length of 80 mm, and containing 20 rods with 2, 3, 4, 5 mm diameter. Figure 1f demonstrates the axial view of PHAN20rod. The white rod inserts within PHAN20rod were

filled with activity concentration of 1.69 × 10<sup>6</sup> Bq/mL, while the gray rod inserts were filled with activity concentration of 8.44 × 10<sup>5</sup> Bq/mL. Overall, a total of 30 phantom configurations were modeled in GATE Monte Carlo simulation, where each phantom configuration was simulated twice with a 20 min emission scan duration, once for Ga-68 (CNN input images) and once for back-to-back 511-keV gamma rays (CNN output images).

**Figure 1.** PHANbrain in (**a**) axial plane, (**b**) coronal plane, (**c**) sagittal plane, and the central axial slice of (**d**) PHAN5rod, (**e**) PHAN1sphere, (**f**) PHAN20rod.

#### *2.4. CNN Models for Positron Range Correction*

Figure 2 shows the architectures of CNN models used in this study to compensate positron range effects of Ga-68 in preclinical PET imaging. CNN1 was a 3-layered model proposed by Dong et al. for super-resolution recovery [15]. CNN2 was a 4-layered model proposed by Nie et al. for pseudo CT synthesis from MRI [16]. CNN3 was the deeply supervised nets (DSN) version of CNN2 to supervise features at each convolutional stage, enabled by layer-wise dense connections in both backbone networks and prediction layers [17]. Because the error distribution was expected to be Gaussian, the root mean square error (RMSE), i.e., the Euclidean distance, was used as the loss function to minimize the difference between Ga-68 PET images and the corresponding gamma source images. Using RMSE as the loss function favors a high peak signal-to-noise ratio (PSNR). The input images were prepared as 32 × 32-pixel sub-images randomly cropped from the original image. To avoid border effects, all the convolutional layers have no padding, and the network produces an output image with 20 × 20 matrix size for CNN1 and 18 × 18 matrix size for CNN2 and CNN3. The training datasets were sub-images extracted from the PET images of 16 PHANbrain and 4 PHAN5rod (TBR = 0, 4, 5, 8) with a stride of 14. The testing datasets were sub-images extracted from the PET images of 4 PHANbrain (other than those used in CNN training) and 4 PHAN5rod (TBR = 2, 10, 16, 20) with a stride of 21. The training and testing datasets provide roughly 111,078 and 25,774 sub-images, respectively. The filter weights of each layer were initialized by using Xavier initialization, which could automatically determine the scale of initialization based on the number of input and output neurons [18]. All biases were initialized with zero. The models were trained using stochastic gradient descent with mini-batch size of 128, learning rate of 0.01 and momentum of 0.9. The CNN models were built, trained and tested by using Caffe (Convolutional Architecture for Fast Feature Embedding) CNN platform (version 1.0.0-rc5 with CUDA 8.0.61) on an Ubuntu server (version 16.04.4 LTS) with two RTX 2080 (NVIDIA, Santa Clara, CA, USA) graphics cards [19].

**Figure 2.** The architectures of CNN1 (**left**), CNN2 (**mid**) and CNN3 (**right**).
