1. Introduction
Radiation therapy (RT) is a cornerstone of cancer treatment, applied in over half of all cancer patients worldwide, either as a standalone treatment or in combination with surgery and chemotherapy [
1]. The primary objective of RT is to maximize tumor control while minimizing radiation-induced damage to surrounding healthy tissues and organs at risk [
2]. Achieving this balance relies heavily on accurate and reproducible patient setup in every treatment session. Even minor setup deviations can result in significant dosimetric errors, reducing tumor control probability or increasing the risk of toxicity [
3]. To ensure setup accuracy, various immobilization devices, including thermoplastic masks and vacuum cushions, are widely used in clinical settings [
4]. Advanced techniques such as Image-Guided Radiation Therapy (IGRT) and Augmented Reality (AR)-assisted systems have also been introduced to improve setup precision [
5,
6].
These technological advances have enabled highly accurate mechanical patient positioning. However, if the patient’s position is twisted between treatment planning and treatment, such misalignment cannot be corrected by simple couch translations or rotations. Therefore, manual patient setup is required, involving direct contact with the patient. Maintaining consistent setup accuracy remains difficult, especially for complex treatment areas such as the chest and pelvis. Variability in patient anatomy, movement, and operator skills contributes to setup errors. Conventional setup training relies on lectures, static images, and limited hands-on practice. These methods do not adequately replicate clinical scenarios or enhance spatial understanding [
7,
8]. Consequently, there is a growing demand for intuitive, immersive, and interactive training tools that bridge the gap between theoretical knowledge and clinical practice.
In recent years, AR technology has emerged as a powerful educational tool in healthcare. AR overlays digital 3D models and clinical data onto the physical environment. This provides learners with real-time, spatially accurate, and interactive experiences [
9,
10]. Systematic reviews have shown that AR-based training can improve procedural performance, anatomical understanding, and learner engagement in disciplines such as surgery, anatomy, and nursing [
7,
11]. In the context of radiation oncology, AR offers promising applications for simulating patient setup and treatment workflows in a safe, repeatable setting [
5,
12]. For example, Microsoft HoloLens 2 enables users to visualize and manipulate anatomical models in real-world space, facilitating experiential learning without the risks associated with real patients [
12,
13].
However, research on AR-based radiation therapy training remains limited, particularly in evaluating its spatial accuracy, clinical integration, and training effectiveness. While some prototype systems have demonstrated feasibility, further validation is needed to establish their utility in real-world educational and clinical environments.
This study aims to develop and evaluate an AR-based training system for patient setup in radiation therapy using Microsoft HoloLens 2. High-resolution 3D anatomical models were generated from CT images, and surface models were acquired using photogrammetry-based methods, then fused for enhanced realism. The system integrates spatial anchors and QR code markers to ensure precise alignment between virtual models and physical phantoms.
2. Materials and Methods
2.1. Research Procedure
Figure 1 illustrates the overall workflow for developing an AR-based training application for patient setup in radiation therapy. In the model generation phase, two types of 3D models were created: CT-based anatomical models using 3D Slicer and surface models acquired through 3D scanning with Luma AI. Each method offers distinct advantages—3D Slicer provides high-resolution internal anatomy with smooth surfaces but lacks complete body contours, especially in the extremities, whereas Luma AI offers rapid full-body capture but with lower geometric fidelity and roughness. To address these limitations, the two models were integrated, combining anatomical accuracy and external realism to create a more complete training model.
In the AR development phase, the fused model was imported into Unity to build an application for HoloLens 2. The system enables users to view and interact with 3D patient models in mixed reality, offering an immersive, radiation-free training experience that enhances spatial awareness and setup skills.
2.2. 3D Model Generation
This study employed a multi-source 3D modeling strategy to develop an AR-based training system for radiation therapy. CT data processed with 3D Slicer provided accurate internal anatomy and smooth anterior surfaces. However, it lacked complete contours (e.g., incomplete arms) and did not capture surface textures, as X-ray attenuation reflects internal density rather than external micro-relief. In contrast, Luma AI rapidly reconstructs complete external contours from RGB videos, preserving gross body shape and appearance, though with lower geometric fidelity and occasional surface roughness or backside holes. By fusing CT (internal accuracy) with Luma AI (external completeness), the resulting model combines anatomical precision with external realism, thereby enhancing its pedagogical value for AR training.
2.2.1. AR Model Construction Using Luma AI and 3D Slicer
Figure 2 shows the front and back views of a 3D model generated using Luma AI, an AI-based platform for rapid 3D reconstruction. In this study, Luma AI was used to create visual models for an AR-based radiation therapy training system. We used a smartphone rear camera (60 FPS) to record multi-angle videos of the phantom. The recording followed a circular path at a distance of 0.6–1.2 m and a height range of 0.7–1.6 m. Each capture lasted 90–120 s without flash. We cleared background clutter to improve reconstruction quality. AI algorithms then reconstructed and completed the geometry. The results had some limits, such as surface roughness and missing back details. Even so, the external contours were good enough for AR development in Unity, though further refinement is needed.
Figure 3 shows the high-precision 3D anatomical model generated from CT images using 3D Slicer. The model was developed to support the AR-based training system for radiation therapy. CT data were acquired from an anthropomorphic phantom. The acquisition parameters were: slice thickness 1 mm, resolution 512 × 512 pixels, and field-of-view (FOV) diameter 40 cm. The images were imported into 3D Slicer. Body surface segmentation was performed to reconstruct a smooth model with clear anatomical structures. Because of the limited CT scanning range, some limb structures, such as the arms, were incomplete.
This study focused on the thoracic region, as chest setup is technically challenging and involves critical organs such as the lungs, heart, and major blood vessels. These structures are highly susceptible to displacement caused by respiration or posture changes, where even minor setup errors may result in dose distribution inaccuracies and increased risk.
To enhance spatial understanding of internal anatomy during training, we performed organ segmentation in 3D Slicer using the AutoSeg Version: d748fd3 (2024-10-24) plugin. AutoSeg is a deep learning-based tool that generates 3D anatomical models deterministically and without manual parameter adjustment. Given the same CT input, the output is reproducible [
14]. In this study, segmented organs included the lungs, trachea, heart, sternum, spine, and ribs, which were imported into the AR system for visualization (
Figure 4).
2.2.2. Blender-Based Model Fusion and Model Visualization in AR
Comparative analysis revealed that the CT-derived model has a smooth surface and clearly defined anatomical structures. However, due to its limited scanning range, it often lacked peripheral regions such as the limbs. In contrast, the Luma AI model can rapidly capture the complete external morphology including arms and legs, providing realistic posture cues for training, albeit with lower anatomical fidelity. Combining these models ensures that even when the thoracic region is the focus, students can still perceive posture–anatomy correspondence within a full-body context.
To overcome these limitations, this study used Blender to fuse the two models. In Blender, the Transform tools (Move, Rotate, Scale) and the Snap function were used for manual registration. The main torso and internal structures from the CT model were preserved. The arm regions from the Luma AI model were aligned with the CT torso using anatomical landmarks such as the nose tip and shoulder joints, assisted by the 3D Cursor and Origin tools. After alignment, the models were merged with the Boolean Modifier (Union) and smoothed using the Voxel Remesher (voxel size 2.0 mm) and Subdivision Surface (Level 1). The final integrated model was then incorporated into the retained CT model, as shown in
Figure 5.
2.3. AR Simulation Process for Radiotherapy Setup
The radiation therapy room is used for clinical purposes during the day, so students typically cannot access it. To enable setup training at any time, a general radiography couch with three-axis movement for X-ray imaging practice was used as a simulated radiation therapy couch.
In this study, a central QR code (C-QRcode, 125 × 125 mm2) was used as the spatial reference for AR-based radiotherapy patient setup simulation. Because the isocenter was in the air, it was not possible to attach a QR code, so it was attached to the floor directly below the isocenter. The QR code was printed with crosshairs on A4 paper, and a laser positioning device was used to align its center to the floor directly below the isocenter of the treatment system, ensuring accurate correspondence between the virtual 3D model and the clinical setup.
During the simulation (
Figure 6), the built-in camera of the HoloLens 2 detected the C-QRcode in real time, calculated its center coordinates relative to the AR world origin, and displayed the 3D patient model and the Varian TrueBeam model at the corresponding location. The procedure consisted of the following steps:
QR code positioning—Place the C-QRcode precisely at the treatment system isocenter and verify its position using the laser positioning system.
Virtual model display—Wear the HoloLens 2, which detects the QR code and renders the 3D patient and Varian TrueBeam linear accelerator models at the corresponding position.
Physical alignment—Adjust the physical phantom on the treatment couch until it is spatially aligned with the virtual model displayed in the HoloLens 2.
Completion of simulation—Once the virtual and physical models are fully aligned, the radiotherapy patient setup simulation is completed.
2.4. System Accuracy Evaluation: QR Code Setup Stability
To evaluate the setup stability of the QR code-based tracking system in the AR training environment, three experimental assessments were performed (
Figure 7).
First, time stability was evaluated by placing the QR code at a fixed distance of 1.0 m from the HoloLens 2 camera and recording its coordinates continuously for 10 s to detect possible temporal drift
Figure 7b. The standard deviation of these measurements (
) was used as the stability indicator.
Second, distance sensitivity was assessed by varying the distance between the HoloLens 2 and the QR code to 0.5 m, 1.0 m, and 1.5 m, recording the positional coordinates at each setting
Figure 7c. The variability was quantified using the standard deviation (
).
Third, angle sensitivity was investigated by fixing the distance at 1.0 m and changing the viewing angle to 0°, 30°, 45°, and 60°, then recording coordinate changes
Figure 7d. The standard deviation (
) represented the angular sensitivity.
As illustrated in
Figure 7, s* represents the designated reference point on the QR code used for coordinating measurements across all evaluations.
Temporal stability, distance sensitivity, and angular sensitivity were all evaluated using the standard deviation (S.D.) of repeated measurements as the stability indicator. In the same manner, distance sensitivity and angular sensitivity were measured using S.D. as the indicator. To obtain a comprehensive measure of system reliability, we calculated the expanded uncertainty (U) by combining these three variabilities in quadrature, following the ISO GUM methodology. This provided a combined measure of temporal stability, distance sensitivity, and angular sensitivity within a unified framework.
Finally, the expanded uncertainty (
U) was calculated using:
where
k = 2 corresponds to a 95% confidence level. This provided a combined measure of setup uncertainty incorporating temporal, distance-related, and angular effects, offering a comprehensive evaluation of tracking reliability for AR-based radiotherapy training.
2.5. Model Overlap Evaluation: Coordinate Acquisition
To evaluate the spatial alignment between the virtual and physical models, corresponding feature point coordinates were obtained in both environments.
To assess system performance, repeated setup simulations were conducted using key anatomical landmarks, and the resulting spatial deviations were quantitatively analyzed. In the virtual environment
Figure 8a, three anatomical feature points were selected on the 3D model: the nose tip, left hip, and right hip. Their real-time 3D coordinates were obtained using a Unity C# script and designated as Virtual Point 1–3.
In the physical environment
Figure 8b, QR codes were placed at the corresponding anatomical locations on the anthropomorphic phantom. The center coordinates of each QR code were recorded using the HoloLens 2 camera and designated as Real Point 1–3.
The center of the C-QRcode was used as the common reference point for both environments, enabling direct comparison of corresponding feature points. These coordinates formed the basis for spatial alignment assessment and deviation analysis.
2.6. Example of a Patient Setup Training Session for Students
Using this system, a medical physics student with no prior clinical experience in radiation therapy conducted setup training. The training was repeated five times. The setup was evaluated by measuring the positional deviation of three QR codes on the tip of the nose and pelvis of the patient phantom in the X, Y, and Z axes.
Three non-collinear anatomical marks (nose tip, left hip, right hip) were selected to define spatial orientation and position. This preliminary evaluation involved one student (N = 5 trials) and was sufficient for a feasibility demonstration. However, future studies should include more landmarks and a larger cohort for robust evaluation.
4. Discussion
We developed an AR-based patient setup simulation system using HoloLens 2 and evaluated its spatial accuracy. QR code tracking demonstrated good stability with an expanded uncertainty of ±2.74 mm (millimeter-level variation). However, setup simulations produced centimeter-level deviations along the X, Y, and Z axes. The primary causes of this discrepancy are likely the students’ lack of patient setup skills, manual placement of the QR marker, and real-time alignment verification. In addition, several practical factors may have amplified these deviations. First, visual acuity and familiarity with AR interfaces differ among novice students, which can affect their ability to perceive subtle misalignments. Second, the anthropomorphic phantom has curved surfaces, making it appear different from various viewing angles and complicating alignment. Third, the phantom is relatively heavy, so fine adjustments are physically difficult and often lead to overshooting. These factors explain why deviations increased to the centimeter level, even though QR code tracking itself maintained millimeter-level stability. Future work will incorporate error propagation analysis and confidence intervals to more systematically separate user-related variability from system-related limitations.
The system used a single QR code as an anchor for simplicity, low cost, and classroom feasibility. However, this approach amplifies slight misplacements when the headset changes perspective, particularly along the Y (vertical) and Z (depth) axes, and provides no dynamic feedback mechanism to correct drift. Future work will investigate multi-marker templates and hybrid tracking to improve accuracy.
The images from the HoloLens 2 view in
Figure 8 and
Figure 9 were of relatively low resolution, limiting visualization and spatial analysis. This was primally due to network fluctuations during transmission. Future studies could use wired or high-bandwidth connections, or local high-definition caching, to improve image quality.
Prior studies using marker-based AR exhibited smaller errors: Tarutani et al. [
15] achieved sub-millimeter accuracy (0.5–0.8 mm), but at the expense of greatly increased setup time, often exceeding 10 min per session. In contrast, our system showed centimeter-level deviations (up to 33 mm), yet a complete trial could be finished within minutes. This trade-off indicates that while Tarutani’s method is clinically precise, our approach prioritizes efficiency and accessibility, which are more suitable for classroom training. Johnson et al. [
16] reported 3.0 ± 1.5 mm accuracy using VSLAM with HoloLens 2. While more precise than our results, VSLAM depends heavily on environmental features and is prone to drift in texture-poor areas. By contrast, our QR code approach provides deterministic anchoring with lower computational demand, yielding millimeter-level stability (expanded uncertainty ±2.74 mm) despite user variability, which may be more robust and cost-effective for training contexts. Compared with Tarutani et al. [
15] and Johnson et al. [
16], our results demonstrated a novel approach: By applying ISO GUM methodology, we combined temporal stability, distance sensitivity, and angular sensitivity into a single expanded uncertainty (U). This integrated evaluation highlights the novelty of our study, as previous research often reported these factors separately.
More robust alternatives have emerged in the recent literature. Zhai et al. [
17] combined AR with point-cloud ICP registration, achieving 0.6 ± 0.2 mm accuracy. This precision, while clinically impressive, requires depth-equipped hardware and intensive computation, limiting feasibility for widespread educational deployment. Zhang et al. [
18] reported 1.6 ± 0.9 mm errors using structured-light surface imaging with AR overlay, while also reducing setup time compared with CBCT workflows. Their approach highlights a clinically viable balance of accuracy and efficiency. Our system, although less precise, achieved comparable training benefits with far simpler hardware, underscoring that meaningful outcomes can be realized in education even without clinical-grade precision. Future work could explore integrating elements of structured light or point-cloud tracking to narrow the accuracy gap while retaining the feasibility advantages demonstrated here.
In contrast, SGRT has already become widely adopted in clinical practice, covering more than 40% of treatment fractions in the U.S. Rudat et al. demonstrated inter-fraction setup errors of 3.6 mm (SGRT) versus 4.5 mm (laser/tattoos) in thorax, abdomen, and pelvis setups (
p = 0.001) [
19]. Oliver et al. showed that augmenting SGRT with real-time holographic outlines (Postural Video™, VisionRT, London, UK) reduced setup time by 28% and minimized repeat imaging by 63% [
20]. These findings highlight that continuous surface feedback is critical for achieving millimeter-level accuracy, something entirely absent in our current AR workflow. Compared with prior methods such as SGRT combined with CBCT, our AR-based approach achieved shorter setup times, which highlights its educational practicality despite lower accuracy.
Hardware platforms also matter. Frisk et al. [
21] reported spine phantom placement errors of 1–2 mm using Magic Leap 2 with dual RGB cameras and an active depth sensor—far more precise than our HoloLens 2 system, though contextually less integration-friendly for radiotherapy workflows.
Despite these limitations, participants in pilot training sessions described the fused anatomical visualization as “intuitively linking posture with anatomy.” This educational benefit mirrors Wang et al.’s findings that AR improves patient understanding and comfort [
22], and Zhang’s workflow that reduced training time by over 30% [
18]. However, the student patient setup training result analysis appeared relatively subjective, with a larger standard deviation than other results. Future studies should include more simulation trials with a larger student cohort to reduce variability and improve reliability.
The limitations of this study are summarized as follows:
The observed deviations may partly reflect the student’s limited setup skills. Future validation with experienced clinicians will help distinguish skill-related effects from methodological limitations.
Moreover, typical thoracic tumors measure approximately 20–40 mm. In our study, setup deviations reached up to 33 mm, which would compromise tumor targeting in clinical settings.
In addition, this study did not include evaluation of GTV. Since the primary aim was setup training, tumor-related assessment was outside the study scope. We therefore acknowledge both the relatively large alignment errors and the absence of GTV evaluation as important limitations.
Because this study is primarily a methodological proposal, the training evaluation was limited to one novice participant. Future studies with multiple students and larger sample sizes will be required to validate the statistical significance of the results.
Finally, the system provides real-time display of virtual models; however, it lacks automatic feedback to confirm alignment accuracy. This distinction further clarifies the current limitation.
Future improvements will focus on:
Passive–active hybrid markers: rigid, multi-marker placement templates to reduce manual variability.
Real-time depth fusion: structured light or LiDAR-based point-cloud ICP at 10 Hz, akin to Zhai et al.’s method [
17].
Closed-loop imaging verification: automatic low-dose CBCT or surface imaging triggering if alignment error exceeds 5 mm, as in Zhang et al.’s workflow [
18].
Set clear goals and criteria, pilot in labs before integrating into pre-clinical modules with instructor guides and multi-learner/device support.
We will also ensure privacy and cybersecurity.
In summary, while QR code tracking shows millimeter-level stability, the observed centimeter-level misalignments indicate their insufficiency for clinical-grade setup. In contrast, SGRT and depth-enhanced AR workflows consistently demonstrate sub-5 mm accuracy. Integrating these technologies—and improving marker design, tracking, and feedback—could transform our system into a clinically viable AR-assisted setup solution.
5. Conclusions
This study developed an AR-based training system using HoloLens 2 to support radiation therapy setup, offering an interactive and realistic environment. Accuracy evaluation showed that QR code tracking achieved millimeter-level variation, with an expanded uncertainty of ±2.74 mm, indicating good stability. However, during student patient setup training, centimeter-level deviations were observed along the X, Y, and Z axes, mainly due to the students’ setup skills and the lack of manual QR code placement and real-time verification. The findings support the feasibility of the proposed AR-based training system. Further validation with larger cohorts will be necessary to confirm statistical robustness.
This patient setup training system can be implemented outside of a radiation treatment room if a 3D moving couch and patient phantom are available, expanding the range of uses. The ability to repeat training without time constraints will enhance user proficiency. This study contributes to immersive educational tools for radiation oncology. AR technology shows potential to improve staff training, reduce setup errors, and enhance treatment safety.
Future research will focus on improving tracking stability, workflow efficiency, and real-time feedback, as well as expanding into other treatment areas.