A Robust Method for Real Time Intraoperative 2D and Preoperative 3D X-Ray Image Registration Based on an Enhanced Swin Transformer Framework
Abstract
:1. Introduction
2. Methods
2.1. The 2D–3D X-Ray Image Registration
2.1.1. Rigid Transformation
2.1.2. Digitally Reconstructed Radiograph (DRR)
2.1.3. Coordinate Establishment
2.1.4. Problem Formulation
2.2. Registration System Framework
2.2.1. XPE-ST Overview
2.2.2. Dual Channel Image Input
2.2.3. Feature Extraction and Fusion Encoder
- 1.
- Feature extraction based on Swin transformer backbone
- 2.
- Feature map reweighting based on Squeeze-and-Excitation (SE) attention mechanism
- 3.
- Multi-layer feature fusion based on feature pyramid
3. Experiments
3.1. Dataset Preparation
3.1.1. CT Voxel Data Representing Three Distinct Anatomical Regions
3.1.2. Training and Testing Data Preparation Through DRR
3.2. Implementation Details
3.3. Comparative Experiments
4. Results and Discussion
4.1. Evaluation Matrix
4.2. Comparison with Traditional Optimization-Based and DL-Based X-Ray 2D–3D Registration Algorithms
4.3. Noise Robustness Testing
4.4. Ablation Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Cleary, K.; Peters, P.T. Image-guided interventions: Technology review and clinical applications. Annu. Rev. Biomed. Eng. 2010, 12, 119–142. [Google Scholar] [CrossRef] [PubMed]
- Markelj, P.; Tomazevic, D.; Likar, B.; Pernus, F. A review of 3D/2D registration methods for image-guided interventions. Med. Image Anal. 2012, 16, 642–661. [Google Scholar] [CrossRef] [PubMed]
- Peters, T.; Cleary, K. Image-Guided Interventions: Technology and Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
- Guéziec, A.; Kazanzides, P.; Williamson, B.; Taylor, R.H. Anatomy-based registration of CT-scan and intraoperative X-ray images for guiding a surgical robot. IEEE Trans. Med. Imaging 1998, 17, 715–728. [Google Scholar] [CrossRef]
- Murphy, M.J. An automatic six-degree-of-freedom image registration algorithm for image-guided frameless stereotaxic radiosurgery. Med. Phys. 1997, 24, 857–866. [Google Scholar] [CrossRef]
- Darzi, F.; Bocklitz, T. A Review of Medical Image Registration for Different Modalities. Bioengineering 2024, 11, 786. [Google Scholar] [CrossRef]
- Hill, D.L.; Batchelor, P.G.; Holden, M.; Hawkes, D. Medical image registration. Phys. Med. Biol. 2001, 46, R1. [Google Scholar] [CrossRef] [PubMed]
- Powell, M.J. An efficient method for finding the minimum of a function of several variables without calculating derivatives. Comput. J. 1964, 7, 155–162. [Google Scholar] [CrossRef]
- Nelder, J.A.; Mead, R. A simplex method for function minimization. Comput. J. 1965, 7, 308–313. [Google Scholar] [CrossRef]
- Hansen, N.; Ostermeier, A. Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation. In Proceedings of the IEEE International Conference on Evolutionary Computation, Nagoya, Japan, 20–22 May 1996; pp. 312–317. [Google Scholar]
- Livyatan, H.; Yaniv, Z.; Joskowicz, L. Gradient-based 2-D/3-D rigid registration of fluoroscopic X-ray to CT. IEEE Trans. Med. Imaging 2003, 22, 1395–1406. [Google Scholar] [CrossRef]
- Frysch, R.; Pfeiffer, T.; Rose, G. A novel approach to 2D/3D registration of X-ray images using Grangeat’s relation. Med. Image Anal. 2021, 67, 101815. [Google Scholar] [CrossRef] [PubMed]
- Ban, Y.; Wang, Y.; Liu, S.; Yang, B.; Liu, M.; Yin, L.; Zheng, W. 2D/3D multimode medical image alignment based on spatial histograms. Appl. Sci. 2022, 12, 8261. [Google Scholar] [CrossRef]
- Haskins, G.; Kruger, U.; Yan, P. Deep learning in medical image registration: A survey. Mach. Vis. Appl. 2020, 31, 8. [Google Scholar] [CrossRef]
- Hou, B.; Alansary, A.; McDonagh, S.; Davidson, A.; Rutherford, M.; Hajnal, J.V.; Rueckert, D.; Glocker, B.; Kainz, B. Predicting slice-to-volume transformation in presence of arbitrary subject motion. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2017: 20th International Conference, Quebec City, QC, Canada, 11–13 September 2017; Proceedings, Part II 20. 2017; pp. 296–304. [Google Scholar]
- Miao, S.; Wang, Z.J.; Liao, R. A CNN regression approach for real-time 2D/3D registration. IEEE Trans. Med. Imaging 2016, 35, 1352–1363. [Google Scholar] [CrossRef] [PubMed]
- Liao, H.; Lin, W.-A.; Zhang, J.; Zhang, J.; Luo, J.; Zhou, S.K. Multiview 2D/3D rigid registration via a point-of-interest network for tracking and triangulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 12638–12647. [Google Scholar]
- Hu, X.; Chen, J.; Chen, Y. RegMamba: An Improved Mamba for Medical Image Registration. Electronics 2024, 13, 3305. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020. [Google Scholar] [CrossRef]
- Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jegou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the 38th International Conference on Machine Learning, Proceedings of Machine Learning Research, Virtual, 18–24 July 2021; pp. 10347–10357. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Tang, Y.; Yang, D.; Li, W.; Roth, H.R.; Landman, B.; Xu, D.; Nath, V.; Hatamizadeh, A. Self-supervised pre-training of swin transformers for 3d medical image analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 20730–20740. [Google Scholar]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-unet: Unet-like pure transformer for medical image segmentation. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 205–218. [Google Scholar]
- Bushberg, J.T.; Boone, J.M. The Essential Physics of Medical Imaging; Lippincott Williams & Wilkins: New York, NY, USA, 2011. [Google Scholar]
- Razi, T.; Niknami, M.; Ghazani, F.A. Relationship between Hounsfield unit in CT scan and gray scale in CBCT. J. Dent. Res. Dent. Clin. Dent. Prospect. 2014, 8, 107. [Google Scholar] [CrossRef]
- Kruger, J.; Westermann, R. Acceleration techniques for GPU-based volume rendering. In Proceedings of the IEEE Visualization, VIS 2003, Seattle, DC, USA, 22–24 October 2003; pp. 287–292. [Google Scholar]
- Gopalakrishnan, V.; Golland, P. Fast auto-differentiable digitally reconstructed radiographs for solving inverse problems in intraoperative imaging. In Proceedings of the Workshop on Clinical Image-Based Procedures, Singapore, 18 September 2022; pp. 1–11. [Google Scholar]
- Shechter, G.; Shechter, B.; Resar, J.R.; Beyar, R. Prospective motion correction of X-ray images for coronary interventions. IEEE Trans. Med. Imaging 2005, 24, 441–450. [Google Scholar] [CrossRef] [PubMed]
- Siddon, R.L. Fast calculation of the exact radiological path for a three-dimensional CT array. Med. Phys. 1985, 12, 252–255. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Podobnik, G.; Strojan, P.; Peterlin, P.; Ibragimov, B.; Vrtovec, T. HaN-Seg: The head and neck organ-at-risk CT and MR segmentation dataset. Med. Phys. 2023, 50, 1917–1927. [Google Scholar] [CrossRef] [PubMed]
- de la Iglesia Vayá, M.; Saborit, J.M.; Montell, J.A.; Pertusa, A.; Bustos, A.; Cazorla, M.; Galant, J.; Barber, X.; Orozco-Beltrán, D.; García-García, F.; et al. BIMCV COVID-19+: A large annotated dataset of RX and CT images from COVID-19 patients. arXiv 2020. [Google Scholar] [CrossRef]
- Liu, P.; Han, H.; Du, Y.; Zhu, H.; Li, Y.; Gu, F.; Xiao, H.; Li, J.; Zhao, C.; Xiao, L. Deep learning to segment pelvic bones: Large-scale CT datasets and baseline models. Int. J. Comput. Assist. Radiol. Surg. 2021, 16, 749–756. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Gravel, P.; Beaudoin, G.; De Guise, J.A. A method for modeling noise in medical images. IEEE Trans. Med. Imaging 2004, 23, 1221–1232. [Google Scholar] [CrossRef]
- Van de Kraats, E.B.; Penney, G.P.; Tomazevic, D.; Van Walsum, T.; Niessen, W.J. Standardized evaluation methodology for 2-D-3-D registration. IEEE Trans. Med. Imaging 2005, 24, 1177–1189. [Google Scholar] [CrossRef]
- Miao, S.; Piat, S.; Fischer, P.; Tuysuzoglu, A.; Mewes, P.; Mansi, T.; Liao, R. Dilated FCN for multi-agent 2D/3D medical image registration. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Grupp, R.B.; Armand, M.; Taylor, R.H. Patch-based image similarity for intraoperative 2D/3D pelvis registration during periacetabular osteotomy. In 2.0 Context-Aware Operating Theaters, Computer Assisted Robotic Endoscopy, Clinical Image-Based Procedures, and Skin Image Analysis: Proceedings of the First International Workshop, OR 2.0 2018, 5th International Workshop, CARE 2018, 7th International Workshop, CLIP 2018, Third International Workshop, ISIC 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, 16 and 20 September 2018, Proceedings 5; Springer: Cham, Switzerland, 2018; pp. 153–163. [Google Scholar]
Region of Interest | CT Pixel Resolution 1 | Voxel Spacing (mm) | Voxel Origin Offset ) 1 |
---|---|---|---|
Head | 1024 × 1024 × 202 | 0.558 × 0.558 × 2 | (−286, −208, −759) |
Chest | 512 × 512 × 176 | 0.694 × 0.694 × 2 | (−168, 140, −1029) |
Pelvis | 512 × 512 × 350 | 0.740 × 0.740 × 0.8 | (−168, −189, −1411) |
Parameters 1 | Region of Interest | ||
---|---|---|---|
Head | Chest | Pelvis | |
dx range (mm) | [−105, −75] 1 | [−25, 25] | [−25, 25] |
dy range (mm) | [−15, 15] | [−15, 15] | [−25, 25] |
dz range (mm) | [30, 90] 1 | [−30, 30] | [−50, 50] |
roll range (°) | [−15, 15] | [−15, 15] | [−105, −75] 2 |
pitch range (°) | [−15, 15] | [−15, 15] | [−15, 15] |
yaw range (°) | [−5, 5] | [−5, 5] | [−5, 5] |
Number of training images | 500,000 | 750,000 | 1,000,000 |
Regions of Interest | Method | Mean Translation Error (mm) 1 | Mean Rotation Error (°) 1 |
---|---|---|---|
Head | Powell | 2.6 ± 3.1 | 1.3 ± 1.5 |
CMA-ES | 2.3 ± 2.6 | 1.2 ± 1.2 | |
Resnet | 0.46 ± 0.50 | 0.51 ± 0.58 | |
Densenet | 0.40 ± 0.44 | 0.55 ± 0.60 | |
XPE-ST | 0.19 ± 0.16 2 | 0.15 ± 0.12 | |
Chest | Powell | 4.2 ± 5.0 | 1.5 ± 1.8 |
CMA-ES | 4.1 ± 4.5 | 1.5 ± 1.6 | |
Resnet | 1.0 ± 1.1 | 0.52 ± 0.54 | |
Densenet | 0.77 ± 0.86 | 0.35 ± 0.40 | |
XPE-ST | 0.34 ± 0.31 | 0.14 ± 0.13 | |
Pelvis | Powell | 6.9 ± 8.1 | 1.3 ± 1.7 |
CMA-ES | 5.0 ± 5.3 | 0.99 ± 1.14 | |
Resnet | 1.3 ± 1.5 | 0.31 ± 0.38 | |
Densenet | 1.0 ± 1.3 | 0.32 ± 0.35 | |
XPE-ST | 0.56 ± 0.49 | 0.14 ± 0.12 |
Regions of Interest | Method | mTRE (mm) | GFR | Time (s) | Memory Usage (Gb) 1 |
---|---|---|---|---|---|
Head | Powell | 4.3 ± 5.6 | 9.31% | 43.62 | / |
CMA-ES | 4.0 ± 4.9 | 8.63% | 46.87 | / | |
Resnet | 0.79 ± 1.0 | 0.00% 2 | 0.0067 | 12.98 | |
Densenet | 0.71 ± 0.75 | 0.00% | 0.022 | 36.62 | |
XPE-ST | 0.32 ± 0.28 | 0.00% | 0.015 | 28.04 | |
Chest | Powell | 7.3 ± 7.1 | 14.23% | 21.04 | / |
CMA-ES | 7.1 ± 6.9 | 13.89% | 23.29 | / | |
Resnet | 1.9 ± 2.2 | 0.00% | 0.0067 | 12.98 | |
Densenet | 1.0 ± 1.3 | 0.00% | 0.022 | 36.62 | |
XPE-ST | 0.41 ± 0.39 | 0.00% | 0.015 | 28.04 | |
Pelvis | Powell | 12 ± 10 | 22.30% | 29.75 | / |
CMA-ES | 8.0 ± 7.8 | 18.26% | 31.56 | / | |
Resnet | 2.0 ± 2.2 | 0.00% | 0.0067 | 12.98 | |
Densenet | 1.6 ± 2.0 | 0.00% | 0.022 | 36.62 | |
XPE-ST | 0.79 ± 0.73 | 0.00% | 0.015 | 28.04 |
Regions of Interest | Method | Mean Translation Error (mm) | Mean Rotation Error (°) | ||||
---|---|---|---|---|---|---|---|
Low | Medium | High | Low | Medium | High | ||
Head | Powell | 5.0 ± 6.2 | 10 ± 10 | 16 ± 14 | 3.0 ± 3.4 | 5.1 ± 5.0 | 9.0 ± 7.8 |
CMA-ES | 4.6 ± 5.2 | 8.2 ± 9.1 | 14 ± 15 | 2.6 ± 3.1 | 4.6 ± 4.2 | 7.3 ± 7.7 | |
Resnet | 3.7 ± 4.5 | 5.2 ± 6.5 | 6.8 ± 8.0 | 2.0 ± 2.3 | 2.9 ± 3.3 | 3.6 ± 4.3 | |
Densenet | 2.4 ± 3.0 | 4.1 ± 5.2 | 6.7 ± 7.9 | 1.3 ± 2.1 | 2.7 ± 3.8 | 3.2 ± 4.5 | |
XPE-ST | 0.24 ± 0.20 1 | 0.36 ± 0.45 | 0.54 ± 0.88 | 0.16 ± 0.15 | 0.18 ± 0.28 | 0.24 ± 0.39 | |
Chest | Powell | 7.7 ± 8.9 | 11 ± 13 | 16 ± 15 | 3.1 ± 4.2 | 6.0 ± 5.4 | 8.5 ± 8.2 |
CMA-ES | 7.3 ± 8.2 | 10 ± 12 | 14 ± 14 | 2.7 ± 3.5 | 4.2 ± 4.8 | 7.7 ± 8.1 | |
Resnet | 3.1 ± 4.1 | 4.5 ± 5.8 | 6.6 ± 7.9 | 1.7 ± 2.3 | 3.2 ± 5.0 | 4.3 ± 6.2 | |
Densenet | 5.8 ± 6.4 | 7.6 ± 9.2 | 10.9 ± 12.0 | 2.1 ± 2.8 | 3.8 ± 4.5 | 5.5 ± 7.3 | |
XPE-ST | 0.56 ± 0.64 | 0.98 ± 1.11 | 1.7 ± 2.2 | 0.17 ± 0.20 | 0.19 ± 0.23 | 0.21 ± 0.34 | |
Pelvis | Powell | 8.4 ± 8.9 | 13 ± 14 | 16 ± 15 | 3.0 ± 3.3 | 6.2 ± 5.3 | 9.7 ± 8.7 |
CMA-ES | 6.3 ± 7.5 | 10 ± 10 | 14 ± 12 | 2.8 ± 2.6 | 5.2 ± 6.0 | 8.0 ± 7.1 | |
Resnet | 2.7 ± 3.4 | 4.5 ± 5.9 | 7.9 ± 9.1 | 1.7 ± 2.4 | 2.5 ± 3.9 | 4.4 ± 5.8 | |
Densenet | 5.1 ± 6.7 | 7.9 ± 9.5 | 10 ± 12 | 2.4 ± 2.9 | 3.9 ± 4.8 | 5.7 ± 7.0 | |
XPE-ST | 0.68 ± 0.89 | 0.91 ± 1.66 | 1.5 ± 2.0 | 0.19 ± 0.26 | 0.21 ± 0.31 | 0.24 ± 0.44 |
Regions of Interest | DC | SE | FPN | Mean Translation Error (mm) 1 | Mean Rotation Error (°) 1 |
---|---|---|---|---|---|
Head | × | × | × | 0.48 ± 0.60 | 0.55 ± 0.69 |
√ | × | × | 0.30 ± 0.30 | 0.31 ± 0.36 | |
√ | √ | × | 0.23 ± 0.20 | 0.24 ± 0.22 | |
√ | √ | √ | 0.19 ± 0.16 2 | 0.15 ± 0.12 | |
Chest | × | × | × | 1.13 ± 1.30 | 0.52 ± 0.62 |
√ | × | × | 0.60 ± 0.71 | 0.29 ± 0.34 | |
√ | √ | × | 0.45 ± 0.48 | 0.21 ± 0.22 | |
√ | √ | √ | 0.34 ± 0.31 | 0.14 ± 0.13 | |
Pelvis | × | × | × | 1.29 ± 1.38 | 0.51 ± 0.60 |
√ | × | × | 0.70 ± 0.87 | 0.32 ± 0.39 | |
√ | √ | × | 0.62 ± 0.62 | 0.24 ± 0.26 | |
√ | √ | √ | 0.56 ± 0.49 | 0.14 ± 0.12 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ye, W.; Wu, J.; Zhang, W.; Sun, L.; Dong, X.; Xu, S. A Robust Method for Real Time Intraoperative 2D and Preoperative 3D X-Ray Image Registration Based on an Enhanced Swin Transformer Framework. Bioengineering 2025, 12, 114. https://doi.org/10.3390/bioengineering12020114
Ye W, Wu J, Zhang W, Sun L, Dong X, Xu S. A Robust Method for Real Time Intraoperative 2D and Preoperative 3D X-Ray Image Registration Based on an Enhanced Swin Transformer Framework. Bioengineering. 2025; 12(2):114. https://doi.org/10.3390/bioengineering12020114
Chicago/Turabian StyleYe, Wentao, Jianghong Wu, Wei Zhang, Liyang Sun, Xue Dong, and Shuogui Xu. 2025. "A Robust Method for Real Time Intraoperative 2D and Preoperative 3D X-Ray Image Registration Based on an Enhanced Swin Transformer Framework" Bioengineering 12, no. 2: 114. https://doi.org/10.3390/bioengineering12020114
APA StyleYe, W., Wu, J., Zhang, W., Sun, L., Dong, X., & Xu, S. (2025). A Robust Method for Real Time Intraoperative 2D and Preoperative 3D X-Ray Image Registration Based on an Enhanced Swin Transformer Framework. Bioengineering, 12(2), 114. https://doi.org/10.3390/bioengineering12020114