*2.2. Pre-Processing*

Although CT and MRI scans were acquired the same day and with an immobilization setup, inter-acquisition motion (i.e., anatomical changes between CT and MRI scans) was present. A manual rigid registration was firstly performed to align CT and MRI scans. The application of deformable image registration (DIR) was investigated but did not show relevant improvements on the quality of sCTs; therefore, it was not applied (Supplementary Material S1). Indeed, in most cases, the multi-modal DIR could hardly compensate for air filling mismatches and caused large bone deformations and artefacts that reduced the size of the training dataset, making it a less effective approach. Nonetheless, given the lack of a real ground truth (i.e., a CT with MRI anatomical configuration), DIR was applied for the generation of pseudo ground truths (see Section 2.4).

CT slices were resampled to the corresponding MRI spacing to guarantee the voxel consistency among the two volumes and clipped to [−1000,+1047] HU as previously performed by Maspero et al. [14] to reduce the discretization step, while background values were set to −1000 HU. For MRI volumes, the pre-processing consisted of: (i) bias field correction to reduce low frequency noise due to magnetic field inhomogeneities [33]; (ii) reduction of Gaussian noise through a non-linear bilateral filter; (iii) contrast enhancement through histogram clipping to 99th percentile of intensity [14,21,24]: (iv) setting of background

values to zero; and (v) histogram matching to similarly distribute grey levels across all MRI volumes [34].

CT and MRI preprocessed volumes were all resized to 256 × 256 to match the network input dimensions [19]. Then, the input MRI and target CT transversal slices were segmented in three channels by means of CT-based masks (i.e., air [−1000,−800] HU, bone [150,1047] HU, soft tissues [−800,150] HU) and then linearly scaled to [−1,1].

As a final step, eight transformations were applied to each CT/MRI channel triplet used for training, including horizontal or vertical flip, Gaussian noise adding, shear, rotation and cropping. Thus, the initial training dataset composed of 2014 CT-MRI slice pairs was enlarged to 16,112. Pre-processing was performed through Python scripts implemented using SimpleITK modules (version 2.0.2).

### *2.3. Neural Network*

The neural network used in this work consisted of a cGAN derived from the opensource network "Pix2Pix" by Isola et al. [19], adapted to work on three channels (air, bone, soft tissues) to better account for the anatomical complexity of the abdomen [35] and to deal with the limited dataset of 39 volume pairs.

The net was trained on transversal slices and composed of a 256 × 256 × 3 U-net generator (Figure 1a), used to generate sCTs, and a 70 × 70 × 3 PatchGAN discriminator (Figure 1b), used to judge the quality of the output with high resolution during the training [19]. The U-net is composed of eight encoder blocks, each comprising convolution, batch normalization and leaky rectified linear unit (ReLU) layers, and seven decoder blocks, each composed of transposed convolution, batch normalization, dropout and ReLU layers. The PatchGAN architecture is made of four encoding blocks, such that each pixel of the 30 × 30 output classifies a 70 × 70 pixel patch of the input image. For the detailed architecture, refer to Figure 1a,b.

The loss function (Equation (3)) combined *L1* norm (Equation (1)) and cGAN adversarial cross-entropy loss (Equation (2)) to reduce blurring effects and artefacts [19]:

$$L\_1 = \; E\_{\mathbf{x}, \mathbf{y}, \mathbf{z}}[||\mathbf{x} - \mathbf{G}(\mathbf{y}, \mathbf{z})||\_1] \tag{1}$$

$$L\_{\rm cGAN} = -\left[\mathbb{E}\_{\rm y,x}[\log D(y,\mathbf{x})] + E\_{\rm y,z}[\log(1 - D(y,\mathbf{G}(y,\mathbf{z})))]\right] \tag{2}$$

$$L\_{\text{tot}} = L\_{\text{cGAN}}(\theta\_{\mathcal{G}}, \theta\_d) + \lambda \cdot L\_1(\theta\_{\mathcal{G}}) \tag{3}$$

where *x* is the target CT, *y* is the input MRI, *z* the noise, *G*(*y*,*z*) the sCT, *D*(*y*,*x*) the PatchGAN output, *θg*, *θd* are the trainable parameters of generator and discriminator, and λ = 100 is a weight for the *L1* norm, as from Isola et al. [19]. The role of this additional loss is to have the generator not only mislead the discriminator, but also generate synthetic images that mimic the target CT in an *L1* sense, reducing the blurring and improving the representation of structures. The cGAN loss was evaluated on the channel triplets, while the *L1* norm was evaluated after reassembling the sCT slice.

The MRI represents the conditional input of the net, with the noise z being provided in the form of dropout on several layers of the U-net generator during training and testing. The training was performed by alternating one gradient descent step on the discriminator and one on the generator, using ADAM optimizer with momentum parameters β1 = 0.5, β2 = 0.999. The training was stopped after 20 epochs, which were sufficient to achieve convergence thanks to data augmentation.

The cGAN was optimized through a six-fold cross-validation (CV), setting the batch size to 1 and discriminator learning rate to 2 <sup>×</sup> <sup>10</sup>−<sup>7</sup> (Supplementary Material S2). This confirmed that the instance normalization (i.e., the use of batch-normalization layers with batch = 1) is well-suited for image generation tasks [36]. The network was trained on the full dataset (i.e., 32 volumes) and tested on the remaining five volumes (Figure 1c). Before stacking all of the output slices to rebuild the synthetic volumes, the values of each channel were re-scaled to the corresponding HU range and re-assembled by means of the same

masks used for the channels' segmentation. The stacked volumes were finally resized to the original MRI [320 × 260 × Nslices] size.

**Figure 1.** (**a**) U-net generator. (**b**) PatchGAN discriminator.

#### *2.4. Experiments*

The results were evaluated by means of similarity metrics such as mean absolute error (MAE), root mean squared error (RMSE), normalized cross correlation (NCC), structural similarity index (SSIM) and peak signal-to-noise ratio (PSNR) between sCT and target CT (Supplementary Material S3), with the exclusion of the background, both within CV and testing scenarios. All metrics were evaluated on the basis of reassembled volumes, applying the corresponding tissue mask. Specifically, the five held-out patients were used (i) to create the test set, where CT-based masks were used for the analysis, and (ii) to build an MRI-only simulation set, where the use of CT-based masks was replaced by manual segmentation of the three channels directly on MRI (Figure 1c).

Within the MRI-only workflow, the evaluation of similarity metrics was performed between sCTs and reference volumes obtained by applying DIR between the target CT and MRI (i.e., pseudo ground truths, CTPGT) to compensate for MRI-CT inter-acquisition motion (Figure 2a).

**Figure 2.** (**a**) Example of MRI, CT, sCT and pseudo ground truth (CTPGT) on axial plane. CTPGT still shows visible discrepancies with MRI anatomical condition. (**b**) Example of inter-acquisition motion in a CT-MRI pair and the resulted sCT in CV. (**c**) Example of MRI, planning CT and synthetic CT from MRI-only scenario. In red, the segmentation of kidneys used for the geometrical analysis.

The MRI-only scenario was evaluated also through geometrical criteria: dice coefficient (DSC), 95th percentile Hausdorff distance (HD) and the center of mass distance (CoMD) were calculated on kidney segmentations of MRI, CT and sCT, to assess the quality of the sCT in terms of correct reproduction of soft tissues with respect to the MRI anatomy. HD, DSC and CoMD were assessed for each couple of segmented volumes (CT-MRI, sCT-CT, sCT-MRI). sCTs were compared to MRIs since we expected the sCT to be representative of the MRI anatomical condition; CT was compared to MRI to determine the initial mismatch, while sCTs were compared to target CTs to confirm that sCT was far from matching CT structures.

Due to the lack of a real ground truth, the net was also validated on a CT-MRI volume pairs obtained through a computational phantom (i.e., XCAT [37] for CT and correspondent ComBAT [38] for MRI) that guaranteed an improved match of anatomical structures between the two volumes, avoiding any inter-acquisition motion (Supplementary Material S4).

The clinical CIRT plans were recalculated on the MRI-only sCT for each patient through the TPS and evaluated on the basis of DVH-based metrics (GTV D95%, CTV D95%, D2% on OARs) and dose difference maps with respect to the original plan. A two onesided test of equivalence for paired samples (TOST-P) was used to compare the DVH metrics, considering a confidence interval of 95% and an equivalence interval of ±0.5%. The global gamma analysis was also performed with 1 mm/1%, 2 mm/2%, 3 mm/3% as

tolerance criteria, and three dose thresholds: 10%, 50%, and 90% on the prescription dose. A range shift (*RS*) analysis was performed to take into account possible dose shifts, which are generally averaged in gamma analysis [15,39]. In this regard, both range shift (*RS*) and relative range shift (*RRS*) were evaluated on a beam-by-beam basis, considering an acceptability threshold set to 5 mm, in accordance with clinical margins used at CNAO [40]. RS and RRS are defined as:

$$RS \, = \, \left( R\_{\rm sCT80} - R\_{\rm CT80} \right) \tag{4}$$

$$\text{RRS} = \begin{pmatrix} \text{RS} \left< \begin{matrix} \text{RS} \\ \text{R}\_{\text{CT80}} \end{matrix} \right> \tag{5}$$

where *RsCT80* and *RCT80* are the beam ranges computed at 80% of the dose peak. This evaluation was performed on the dose profile along the central axis of each beam, considering 10 transversal slices, for a total of 60 RSs.

All computational steps were performed on a Precision 5820 Tower DELL workstation equipped with a 16 GB RAM Nvidia GPU (QUADRO P5000). A full training procedure took around 12 h, while the generation of a synthetic volume took ~5 s.

#### **3. Results**

The similarity metrics from CV, testing and MRI-only simulation are presented in Table 2. Comparable results were found between CV and test performance, with MAE on soft tissues and air channels being lower than that on bone. Notwithstanding interacquisition discrepancies (Figure 2b), the air channel showed an error of 54.42 ± 11.48 HU, the soft tissues 55.39 ± 3.41 HU, and the bone structures 86.03 ± 10.76 HU.

**Table 2.** Average results for CV, test and MRI-Only procedures, compared to the literature. Average (St. Dev.). \* Soft tissue. \*\* Lungs. \*\*\* Vertebral bodies. \*\*\*\* Bidirectional network.


The results relative to the MRI-only evaluation were characterized by larger values on all metrics. In particular, the errors grew to 279.01 ± 142.46 HU on the air channel, 154.87 ± 22.90 HU in the case of bone structures, and 88.22 ± 9.88 HU within the body.

The geometrical analysis of the MRI-only workflow (Figure 2c) showed the lowest discrepancies between CT and MRI segmentation, confirming good accuracy in replicating MRI anatomy. As an example, the average CoMD in sCT-MRI was 5.85 ± 4.87 mm versus 7.75 ± 5.92 mm in CT-MRI and 13.30 ± 10.42 mm for sCT-CT. Detailed results are reported in Supplementary Material S5.

The validation performed on the phantom showed an MAE of 73.3 HU on the whole volume, while the MAEs on the three channels were of 66.11 HU, 77.93 HU and 167.36 HU for soft tissues, air, and bones, respectively (see Supplementary Material S4).

As for dose accuracy, Figure 3 shows the DVH comparison for patient and dose recalculations on P17 and P27 (complete results reported in Supplementary Materials S6–S8). The GTV and CTV D95% as well as the D2% on the organs at risk (OARs) are displayed in Figure 4, expressed in terms of dose difference Δ(sCT-CT), MAE and relative error (E[%]) with respect to the prescribed dose. For all five patients, good reproducibility was shown relative to the dose to the GTV, with patients P20 and P21 being characterized by errors of −2.04 Gy[RBE] and −1.88 Gy[RBE] respectively, corresponding to −5.3% and −3.3% of the prescribed dose. The MAE on the GTV D95% was 0.86 ± 0.90 Gy[RBE].

**Figure 3.** (**a**) DVH comparison on patients P17 and P27; (**b**) original CIRT plan (RBE) and sCT-based recalculation for patients P17 and P27.

**Figure 4.** (**a**) D95% values for GTV and CTV in the original plan (CT) and the recalculated one (sCT). For P17, PTV was considered in this comparison (shown in Green). The table contains the dose values and the dose difference Δ[Gy[RBE]], as well as the error relative to the prescribed dose E [%]. (**b**) D2% difference (sCT-CT) for the main OARs on each patient, and the D2% MAE over each OAR. The red mark indicates the median, and edges of the box show the 25th and 75th percentiles.

Similarly, the values relative to the CTV showed the two dose distributions to have comparable results, with a maximum error of −5.10 Gy[RBE] for P21 (−8.9% of the prescribed dose). The MAE on the D95% CTV was 1.34 ± 1.33 Gy[RBE]. The D2% relative errors on OARs lay in an interquartile range (IQR) of [−0.24,0.22]%, although the colon reached a

peak relative error of 37.03% (14.22 Gy[RBE]), and the duodenum of 16.25% (7.8 Gy[RBE]) (Supplementary Material S6). The voxel-wise dose difference maps are shown in Figure S5 in Supplementary Material S7, highlighting the worst-case slice for each test patient. Regions with a high dose difference can be mainly seen with correspondence of air pocket mismatches, while the overall distribution of dose to the body is comparable. Indeed, as in Table S6 in Supplementary Material S7, all test patients showed a median dose difference close to zero, with the widest IQR being 0.158 Gy[RBE] on patient P21. The maximum errors were in the range [22.64,42.36] Gy[RBE]. In this regard, an incomplete reproduction of the kidney affected the dose distribution in patient P27 (Figure S6), while the limited field of view of MRI introduced dose artefacts on recalculation for patient P21 (Figure S7).

A peak gamma pass rate of 94.88% was obtained in the 3%/3 mm analysis (Supplementary Material S9).

The range shift analysis showed a median (IQR) RS of 0.98 (3.64) mm and RRS of 0.61 (2.14)% (Figure 5). Considering each beam individually, as shown in Table S9 in Supplementary Material S10, patient P27 was the one showing the highest errors, with a median RS of 5.69 (6.97) mm.

**Figure 5.** Representative range shift analysis on patient P27. (**a**) The graph shows both dose and HU profiles in CT (light blue) and sCT (red), evaluated along the yellow line shown in (**b**). The 80% reference is marked by the horizontal line. (**b**) Corresponding CT and sCT sagittal views are compared. The yellow line is one of the 10 considered for each beam, on different transversal slices.

#### **4. Discussion**

In this work, we investigated for the first time the feasibility of a cGAN in generating sCTs of the abdominal site for applications in CIRT. The cGAN, trained with transversal CT-MRI slice pairs, was optimized to work on three channels corresponding to air, bone and soft tissues to better account for the anatomical complexity of the abdomen.

The performance of the network was firstly evaluated on the basis of similarity metrics of the test set built with CT-based segmentation. Despite MRI to CT inter-acquisition motion, the MAE on the body (57.08 ± 2.79 HU) was comparable to results obtained by other works on the abdominal site that used multiple Dixon sequences and more complex architectures (55.56 ± 2.27 HU [26] and 60.42 ± 2.27 HU [23]), as well as U-nets trained on T1w–T2w MRI acquisitions (62 ± 13 HU) [24]. In addition, this work favorably compares to other studies exploiting cGAN or cycleGAN in terms of MAEBODY [20,21,25], even if the bidirectional network from Xu et al. achieved outstanding results, although working on a much wider unpaired dataset [22]. The NCC (0.92 ± 0.02) was shown to be consistent with those of other works [23,25,26], while the PSNR (27.64 ± 0.68 dB) was comparable to the work by Fu et al. [21]. Optimal MAE metrics were obtained in the generation of bone structures (86.03 ± 10.76 HU) and soft tissues (55.39 ± 3.4 HU), outperforming other works in the literature [20,24,25,28].

The MAE on the air channel and bones was also low with respect to other approaches with cycleGAN [25]. This could be due to the use of CT-derived masks on the test set, which may have aided the replication of CT air pockets and bone structures on the sCTs.

In order to cope with this and to simulate a real-case scenario, the network was then tested on an MRI-only simulation set, where the MRI segmentation was completely independent from the use of CT. Given the limited performance of multi-modal DIR on the considered dataset, manual segmentation of the three channels on MRI was considered the most accurate approach, despite the time-consuming task for clinical purposes. Nonetheless, DIR was applied between the target CT and MRI to overcome the lack of a ground truth CT representative of MRI anatomical condition, notwithstanding the minor contribution of such registration. Indeed, this process did not fully compensate for different air cavities and inter-acquisition motion; therefore, the volumes (i.e., CTPGT) used as reference were still showing visible discrepancies with respect to the MRI, biasing the evaluation of the performance of the network (Figure 2a). A similar consideration applied by Florkow et al. highlighted errors in HU intensities caused by inter-acquisition variations that were not compensated by the deformable registration [24]. Moreover, the manual segmentation of MRI represented a complex step, especially for bones, since they are not clearly visible on VIBE volumes. Nonetheless, the results were shown to still be acceptable when compared to the literature, with MAE on soft tissues (75.00 ± 8.12 HU) being aligned to results obtained with GAN (90 ± 29 HU) or cycleGAN (58.62 ± 30.61 HU) [25,27].

The geometrical analysis, performed on the MRI-only test set, was conducted to support the good geometrical agreement between sCT and MRI: in general, sCT and MRI segmentations showed the best match, confirming the expected performance of the net in reproducing the MRI anatomy, while sCTs showed higher deviations with respect to the target CT scans. This evaluation was performed on segmentations of the kidneys, which are well contrasted organs, and can be considered representative of the geometrical accuracy in the reproduction of soft tissues.

Due to the lack of a proper ground truth on patients' data, we supported our results with a validation performed on a single CT-MRI volume pair obtained through a computational phantom (i.e., XCAT [37] for CT and correspondent ComBAT [38] for MRI). This approach, which guaranteed the perfect anatomical match between the two volumes, provided promising outcomes (i.e., MAE of 73.3 HU on the whole volume) aligned with the presented results and the literature. The MAE on the three channels described a discrete reproduction of the air fillings and soft tissues, with poorer performance on bones, as noticed also in the literature. NCC reached 0.89, describing an acceptable reproduction of the anatomical structures (see Supplementary Material S4).

For dose analysis, most patients (Figure 4a and Supplementary Materials S6–S8) presented DVH metrics within clinical tolerances (i.e., below 3% of the prescribed dose) and aligned with studies in the literature on protons (relative error < 3% on PTV [25] and relative error < 2% on ITV—the internal target volume [24]). Moreover, *p*-values gave statistical evidence of the equivalence between the DVH metrics from sCT and CT (*p* < 0.05, Table S7 in Supplementary Material S8); however, a larger test dataset would make these considerations more solid.

Concerning DVH metrics, Liu Y. et al. for protons [25] as well as Liu L. for photons [28] obtained promising results, with errors on the maximum dose on the PTV being in the range [−1,+1] Gy[RBE], also thanks to the use of lateral beams that avoided most of the air-filled organs. Florkow et al. [24] obtained acceptable errors on the OARs, with D2% being in the range [−2.7,3.7]% of the prescribed dose. Notwithstanding the use of DIR, the work by Florkow et al. suffered from inter-scan variations (i.e., air fillings), that may have caused an overestimation of the actual differences between the planning CT and sCT [24]. In our work, the IQR on the D2% relative error was [−0.24,0.22]%, but high errors were highlighted on the colon (37.03%) and the duodenum (16.25%, Figure 4c), which were highly affected by inter-acquisition motion of air fillings between sCT (representative of MRI anatomy) and planning CT. In this regard, the work by Knäusl et al. [32] showed that the constraints on OARs are very challenging for compliance, presenting errors of up to 28%, mostly due to the incorrect representation of bones or air cavities. Discrepancies in the dose distributions were also confirmed by the gamma analysis, which in our work showed a peak value of 94.88 ± 4.9% against the 99.37 ± 0.99% reported in the literature [25]. Table S8 in Supplementary Material S9 shows a comparison of gamma pass rates from relevant studies on proton and photon applications, which were, therefore, not fully comparable to our application and highlighted the need for more studies on CIRT. The range shift analysis presented median RS within the clinical threshold (i.e., 5 mm), but with critical results for patients P27 and P31 (median (IQR), 5.69 (6.97) mm and 3.09 (2.60) mm, respectively), because of the presence of unmatched air pockets (Figure 5b). In proton plans, a maximum RS value of 5.6 mm (5.68%) was reported [25], whereas in our case, the inconsistencies of air cavities led to RS values of up to 15.37 mm (i.e., RRS = 9.09%, patient P27). This aspect may be critical for CIRT application, and needs to be analyzed on a wider population.

Similarly, the highest dose discrepancies were mostly found in correspondence of the different disposition of air pockets between sCT and planning CT (Figure S5 in Supplementary Material S7). The maximum error for patient P27 was due to an incomplete reproduction of the kidney (Figure S6), which caused an overdose for duodenum (+5.7% D02) with respect to the prescribed dose (Supplementary Material S6). The dose artefacts on recalculation for patient P21 reported regions of high dose differences (Figure S7), but this issue was not correlated with the quality of the sCT and could be easily overcome by acquiring wider volumes.

The main limitation of the study was the lack of a proper ground truth to validate the proposed approach, which could not be fully compensated due to the poor performance of multi-modal DIR in the abdominal site. As such, the use of computational phantoms [37,38] to ensure the correspondence between CT and MRI scans will be considered in the future as an effective approach to validate the proposed network, as anticipated in this work (Supplementary Material S4), and enlarge the training dataset. In addition, although the three channel implementation allowed good performance of the net in such a complex anatomical site with limited data, manual segmentation can be demanding, especially for not well-contrasted structures in VIBE acquisitions, such as bone, and is definitely not suited for clinical application. Further steps could be, therefore, to include the acquisition of specific MRI sequences (e.g., ultrashort echo time, UTE) to facilitate bone segmentation or avoid channel separation in an improved version of the net [26]. We also expect that our results could be improved by increasing the dataset; in future analyses, we intend to include different respiratory phases to (i) achieve higher accuracy and limit errors in the reproduction of tissues, (ii) eliminate separation in the three channels, and (iii) derive synthetic respiratory-correlated 4DCT [10,41]. Finally, although our results were mainly affected by air-filling effects, the absence of the thermoplastic mask on the sCT could also have an impact [32]; as such, a uniform and pre-defined outline to the sCT could be applied [32], although it would not be an optimal countermeasure.

Despite the above-mentioned limitations, this work showed that the three-channel cGAN can generate accurate sCTs of the abdominal site that can support treatment planning, evaluation, and adaptation in CIRT. To the authors' best knowledge, this work is the first analysis applied to the abdomen for CIRT, and thus represents a starting point for future in-depth analyses of the feasibility of MRI-only workflows in CIRT.

**Supplementary Materials:** The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/bioengineering10020250/s1, Figure S1: Average MAE values on the three channels, for each batch tested; Figure S2: Coronal view of the XCAT phantom (A. original, B. with gaussian noise), ComBAT phantom (C) and the resulting synthetic CT (D); Figure S3: A. Histogram representing the HD 95th percentile. B. Histogram relative to the CoM distances. C. Histogram of Dice coefficient; Figure S4: DVH graphs comparing original plan (solid line) with sCT-based recalculations (dotted line); Figure S5: CT, sCT and dose difference maps for the test patients. The transversal slice shown contains the pixel with maximum error (black dot). Red arrows show the possible causes; Figure S6: Coronal view of P27, showing an error in the reproduction of the kidney, that affects the dose distribution; Figure S7: Dose recalculation on patient P21, coronal view. The discrepancy in the MRI-CT FoV causes dose artefacts in the recalculation; Figure S8: Representative range shift analysis on patient P17. The graph shows the Dose and HU profiles evaluated along the white line, as shown on CT and sCT below. The horizontal black line represents the 80% of the peak of dose; Table S1: DIR vs No-DIR average metrics results on the test set (St. Dev.); Table S2: Cross validation results, Table S3: Results of the network validation on the phantom; Table S4: Geometrical evaluation of kidney's segmentations; Table S5: D2% Relative errors on each OAR. The values are referred to the prescribed dose; Table S6: Dose difference median values [IQR] for the test patients on the whole plan; Table S7: TOST-P test of equivalence P-values; Table S8: Average Gamma pass rates (St. Dev.) of this work and literature references; Table S9: Median (IQR) RS and RRS evaluated over 10 transversal slices for each beam. Ap = Anterior-posterior. L = Lateral.

**Author Contributions:** Conceptualization, G.P., A.V., C.P., S.M., G.M. and G.B.; methodology, G.P., C.P., G.M. and N.G.; software, G.P.; validation, G.P. and F.C.; formal analysis, G.P., C.P., A.N. and N.G.; investigation, G.P.; resources, G.B.; data curation, G.P., V.V., A.B., S.M., A.V. and M.C.; writing original draft preparation, G.P. and C.P.; writing—review and editing, A.N., N.G., M.C., A.P., S.M. and E.O.; visualization, G.P. and C.P.; supervision, C.P., G.B. and A.V.; project administration, G.B., C.P. and E.O.; funding acquisition, G.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Marie Skłodowska-Curie grant RAPTOR—Real-Time Adaptive Particle Therapy of Cancer, grant agreement No. 955956 and partially supported by AIRC (Associazione Italiana per la Ricerca sul Cancro), grant number IG2020-24946.

**Institutional Review Board Statement:** The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board (or Ethics Committee) of CEN-TRO NAZIONALE DI ADROTERAPIA ONCOLOGICA (CNAO) (protocol code CNAO 37-2019 4D-MRI, 2019).

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** Data are unavailable due to privacy and ethical restrictions.

**Acknowledgments:** The authors would like to thank Luca Anemoni and Alice Mancin for patients' data collection. Anestis Nakas would like to express thanks for the support of the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant RAPTOR— Real-Time Adaptive Particle Therapy of Cancer.

**Conflicts of Interest:** The authors declare no conflict of interest.
