*3.1. Data and Augmentation*

We conducted experiments on three different types of lung data, TCIA [40–43] patient with a tumor, Dirlab [44] lung CT without tumor, and CIRS phantom. ITK-SNAP is used for automatic segmentation to obtain labels. In the TCIA patient data, we selected one of the patients for the experiment. In Dirlab, we selected the first five sets of data for the experiment. In the CIRS phantom, we simulated the lung tumor with a water sphere. In the experiment, we resampled the 3D CT image to 128 ∗ 128 ∗ 128 with a voxel spacing of 1 mm ∗ 1 mm ∗ 1 mm. Since our experiment is a 2D/3D registration, paired 2D projections and 3D medical images of the same moment are rare. It is unethical to expose the human body to additional radiation doses, so the first task is data augmentation. However, for 2D/3D registration of the treatment phase (e.g., radiotherapy, surgical navigation), it is obvious that the focus is more on the specific person. Therefore, we chose a hybrid data augmentation approach to train a deep learning-based 2D/3D registration model for a specific human body.

In the hybrid data augmentation shown in Figure 1b, we first selected the endexpiratory phase of 4D CT as the moving image *MCT* and the remaining phases as the fixed image *CT*1,...,*i*,*j*,9. Then, we used the conventional intensity-based image registration method to obtain nine deformation fields in the order of *φ*1,...,*i*,*j*,9. The deformation fields used for data augmentation were arbitrarily selected from two of the nine deformation fields and superimposed with random weights to obtain many inter-phase deformations. The lung may also change during respiratory motion. Therefore, we use thin plate spline (TPS) interpolation to simulate small changes in specific phases. The number of control points N was randomly chosen between 20 and 60. The movement distance of control points was chosen between 0 mm and 20 mm to obtain many phase-specific random deformations. To obtain more morphologically diverse images, we combined inter-phase and intra-phase specific deformation with random weights to obtain many hybrid deformation fields. Spatial warping of the moving and segmented images was performed to obtain CT and segmented images representing each respiratory phase of the lung.

#### *3.2. DRR Image Generation*

The orthogonal angle X-ray projection system in this experiment is shown in Figure 3. Two-point light sources at orthogonal angles emit rays through the object and project them on two detectors perpendicular to the central axis. We assume that the initial intensity of *I*<sup>0</sup> at the light source, *μ* is the internal attenuation coefficient of the object to the rays, *I* is the thickness of the ray through the object, and *In* is the intensity of the ray after passing through the object. The formula *In* <sup>=</sup> *<sup>I</sup>*0*e*<sup>−</sup> " *<sup>μ</sup>*(*l*)*dl* arises. After the projection of one ray is finished, the attenuation coefficient obtained by accumulating the whole path and then converting it to CT value is the X-ray image. In this experiment, like most researchers, DRR images with the same imaging principle are used instead of X-ray. Virtual X-rays were used to pass through the CT images, and after attenuation, they were projected onto

the imaging plane to reconstruct the DRR images. The 3D CT images representing each respiratory phase after data augmentation are projected using this method to obtain the DRR images at the corresponding moment. This technique has been widely used for 2D/3D registration methods.

**Figure 3.** Schematic diagram of DRR image generation.

### *3.3. Experiment Detail*

We used hybrid data augmentation to obtain 6000 samples from the three types of experimental data. Of these, 5400 were used as the training set, 300 as the validation set, and 300 as the test set. Our experiment was implemented using the deep learning framework Pytorch 1.10 on a NVIDIA A6000 GPU with 48 G of memory, and an AMD Ryzen 7 3700X 8-core processor with 128 GB of internal memory. The learning rate is set to 10<sup>−</sup>4. For all datasets, the batch size was set to 8 and the optimization algorithm is Adam.

#### *3.4. Experiment Evaluation*

In order to verify that our model can achieve 2D/3D registration by two orthogonal angular projections, we selected the end of expiration as the moving image and aligned it toward the remaining phases. We evaluated the three-lung data using NCC, MI, 95% Hausdorff surface distance, and Dice. In addition, to explore the tracking of lung tumors that can be achieved by our model, we compared between predicted and ground truth values for the dataset with tumors and quantitatively evaluated using Dice and the tumor center of mass.

#### **4. Result**

#### *4.1. Registration from the Expiratory End to Each Phase*

Here, we demonstrate the registration results of each phase from the end of expiration to the end of inspiration for the TCIA, Dirlab, and phantom. For the qualitative assessment, Figure 4a shows the results of our selected experiments on patients with tumors on TCIA, Figure 4b shows a randomly selected set of experiments from Dirlab, and Figure 4c shows the effect of registration of the phantom data. The odd rows are the unaligned ones, and the even rows are the aligned results. Based on the results, both TCIA patients with tumors and without tumors in Dirlab, as well as the phantom model with water balloons that simulate tumors, can achieve registration from the end of expiration to the rest of the stages.

For the quantitative analysis, we used Dice of the segmentation map, 95% Hausdorff surface distance, NCC, and MI of grayscale images to evaluate our scheme separately. The results are shown in Table 1. It can be seen that good registration results are obtained for all three types of data, not only on the grayscale images, but also on the lung of interest. The Dice values of all three data types are above 0.97, the Hausdorff surface distances are below 2 mm, NCC are above 0.92 and MI are above 0.90. Compared with the real

human lung, the NCC and MI of the phantom data are relatively small because the lung of the phantom itself does not change. Only the internal water sphere changes, which is more rigidly transformed relative to the real patient, so the NCC and MI are relatively small at higher Dice. However, the total accuracies are still above 0.92 and 0.90. Therefore, quantitative and qualitative results show that the proposed method can achieve non-rigid 2D/3D registration for a specific subject by two orthogonal angular projections.

**Figure 4.** Registration from the exhalation end to the other stages. (**a**) shows the results of our registration on TCIA, (**b**) a randomly selected set of experiments from Dirlab, and (**c**) the registration results of the phantom data. The odd-numbered rows are the unregistered contrast images, and the even-numbered rows are the registered contrast images.



#### *4.2. Tumor Location*

Both TCIA patient and the phantom contained tumors. The accuracy of tumor localization was evaluated qualitatively and quantitatively.

Figure 5 shows the qualitative evaluation of the 3D tumor with two types of data, where (a) is a 3D visualization image of the patient's overall lung and tumor and (b) is of the phantom data. Table 2 presents the quantitative results, where we evaluated the tumor center mass and Dice. The tumor center of mass deviation is within 0.15 mm for the real patient. The phantom tumor center of mass is less than 0.05 mm. The Dice of both are above 0.88. It can be seen that the proposed method can achieve registration both for the whole lung and for the tumor. In addition, the fact that local tumors are well aligned suggests that our model could be useful for clinical applications such as tracking tumors.

**Figure 5.** Tumor registration results from the exhalation end to other stages. Where (**a**) is the 3D presentation of the results before and after a real patient's lung and tumor registration, and (**b**) is of the phantom data of the 3D results of the tumor display. The odd rows are the unregistered images, and the even rows are the post-registered images. The red image indicates the ground truth. Blue is the moving image, and green is the predicted result obtained by the model.

**Table 2.** The accuracy of tumor location from the expiratory end to each phase.


#### **5. Discussion**

#### *5.1. Traditional Registration in Data Augmentation*

We used traditional intensity-based image registration for data augmentation to complete the registration between phases. Here we present the two most distorted parts of the three data, the end of expiration and the end of inspiration, for evaluation. The experimental results are shown in Figure 6.

Figure 6 shows the results from three directions before and after the registration. The odd columns are the unregistered images. The even columns are the results after the traditional registration method. Conventional image registration can be accomplished for real patients and models from exhalation to the end of inspiration, ensuring that our augmentation data encompasses all respiratory phases of the lung. In addition, the registration covers the larger deformations at both ends.

**Figure 6.** The traditional registration method results from the end of exhalation to the end of inhalation. The first two columns are the registration results on the patient, the middle two are the registration results on the normal human lung, and the last two are the registration results on the phantom. The odd columns are the unregistered images, and the even columns are the results of the registered images.

#### *5.2. Landmark Error*

For the Dirlab data, the landmark points and the deformation field for performing data augmentation are known. Thus, the landmark points of the image generated after data augmentation are also known and used as our ground truth. The mean target registration error (mTRE) is evaluated with our model-predicted images.

The data obtained from the evaluation are shown in Table 3. The corresponding box plot is shown in Figure 7, in which green indicates the data before registration, yellow is the data after registration of the proposed method, and purple represents the data after 3D/3D registration using Demons [45]. The results show that the proposed model can achieve effective 2D/3D registration, but the accuracy is lower than the existing advanced 3D/3D registration models because the experimental data are only two 2D X-rays with orthogonal angles. Although the proposed method transforms 2D/3D registration into a 3D/3D registration problem, some image details are indeed lost compared with the 3D images, resulting in the loss of information on tiny details, such as capillaries, leading to a lower accuracy of landmark error based on detailed information. However, in the 2D/3D registration mission, more attention is paid to the global overall changes in the lung and the tumor location. The proposed method greatly reduces the irradiation dose and improves the registration speed.


**Table 3.** Mean target registration error of landmarks in Dirlabs.

**Figure 7.** Box plot of landmark points in Dirlab; green indicates the data before registration, yellow is the data after registration of the proposed method, and purple represents the data after 3D/3D registration using Demons.

In addition, our method can complete 2D/3D registration in 1.2 s. In contrast, other data-driven 2D/3D registration models, such as [20], may take a few seconds. On the other hand, traditional image registration methods may take tens of minutes or even hours. Our method also only needs two different angles of X-rays, which greatly reduces the amount of radiation and makes the hardware in the clinic easier to use. Of course, our method also has some limitations. First of all, since real medical images do not exist at the same moment of paired orthogonal angles of X-ray and corresponding 3D CT, we use 2D DRR. Although DRR and real X-ray use the same imaging way, it is undeniable that there are some grayscale and noise differences between the two. However, it can be corrected by using existing methods, such as histogram matching [24], network of GAN [25], etc., which is not the main focus of our study. We will also make some improvements to the program to speed up the processing speed for radiotherapy or interventional procedures that require more real-time, etc. In addition, since there are few non-rigid 2D/3D registration articles, the code is not open source. We have yet to choose a suitable comparison experiment, and we will continue to look for it in the future.

#### **6. Conclusions**

This study proposes a deep learning-based 2D/3D registration method using two orthogonal angular X-ray projection images. The proposed algorithm has been verified on lung data with and without tumor and phantom data, and obtained high registration accuracy, where Dice and NCC are greater than 0.97 and 0.92. In addition, we evaluated the accuracy on the data containing tumor, and the tumor center-of-mass error was within 0.15 mm, which indicates the promising use of our model for tumor tracking. The registration time is within 1.2 s, and this is promising for clinical applications, such as radiotherapy or surgical navigation, to track the shape of organs in real time. Moreover, we only need to use two orthogonal angles of X-rays to achieve 2D/3D deformable image registration, which can greatly reduce the extra dose during treatment and simplify the hardware system required.

**Author Contributions:** Conceptualization, G.D., J.D., Y.X. and X.L.; methodology, J.D. and X.L.; software, J.D.; validation, N.L.; investigation, N.L., H.W. and L.X.; data curation, G.D. and J.D.; writing—original draft preparation, G.D. and J.D.; writing—review and editing, C.Z., W.H., L.L., Y.C., L.L. Y.L. and X.L.; visualization, L.L. and X.L.; supervision, X.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work is partly supported by grants from the National Natural Science Foundation of China (U20A201795, U21A20480, 82202954, 61871374, 62001464, 11905286), Young S&T Talent Training Program of Guangdong Provincial Association for S&T, China (SKXRC202224), and the Chinese Academy of Sciences Special Research Assistant Grant Program.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The dataset images used for this study is publicly available on TCIA at https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=21267414 (accessed on 27 October 2022) and Dirlab at https://med.emory.edu/departments/radiation-oncology/researchlaboratories/deformable-image-registration/downloads-and-reference-data/4dct.html (accessed on 27 October 2022).

**Acknowledgments:** We thank The Cancer Imaging Archive (TCIA) and Emory University School of Medicine (4D CT) for public access and for sharing their datasets.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:

