*2.4. Outcome Measures*

The accuracy of the primary radiological screw insertion was the main parameter assessed in this study. Accuracy was measured by the amount of deviation from the plan using two continuous variables: (1) deviation from planned entry point (in mm); and (2) the 3D angular deviation (in degrees) from the planned trajectory. For the 3DPG arm, the deviation was measured following level-by-level registration of the postoperative CT with the preoperative CT, which included the planned trajectories. For the CAS study arm, the postoperative CT was registered with the intraoperative CT, which included the stored CAS trajectories (Figure 3). As a result of the registration per vertebra (single-level registration of each vertebra), the patient's alignment (supine vs. prone) and spinal mobility did not affect the final analysis.

**Figure 3.** Schematic overview of the study depicting preoperative planning, preoperative measurements, and postoperative assessment procedures. 3DPG: 3D printed guides; CAS: Computer assisted surgery; 3D: 3 dimensional; PostOp CT: postoperative CT.

During the analysis, computerized registration and measurements were performed wherever possible. In technical terms, this procedure entailed (1) automated surface-based model registration using the Iterative Closest Point algorithm and (2) the use of automatically fitted analytical cylinders at the screw positions. These cylinders were automatically placed over the screw objects to prevent any assessment bias. Screw segmentation was done using standardized Hounsfield thresholds to eliminate bias by segmentation. The 3D deviation analysis has previously been shown to have very high inter-rater reliability, with an intraclass correlation coefficient of 0.99 [23].

### *2.5. Statistics*

The objective of this study was to assess the non-inferiority of the 3DPG navigational technique relative to CAS. The calculation of the sample size was based on preselected margins of non-inferiority: 1 mm for entry-point accuracy and 3◦ for angular accuracy. In order to obtain representative margins and in the absence of representative published 3D deviation data, we conducted a small, pilot 3D simulation in which we measured the maximum amount of screw rotation until pedicle wall breach occurred. The upper limit of the rotation in which 99% of screws fitted within the pedicle was calculated as 3.29◦ . The allowable margins of error for screw placement reported in the literature are around 1 mm and 5◦ , respectively, for translation and rotation [32,33]. Although the metrics reported in the literature and those obtained in our pilot experiment were slightly different, they were in line with our predetermined margins. Hence, we concluded that they justified our selected margins of non-inferiority.

The sample size of each group was calculated according to the accuracy data derived from our pilot study along with additional pilot data gathered during CAS-assisted surgery. Because the current study focused solely on radiological accuracy outcomes obtained through 3D deviation analysis and because we assumed that the screws were independent, we calculated the screw-based sample size. Applying our assumptions in the calculation of sample size, we found that 36 screws demonstrated 90% power for determining noninferiority at a significance level of 0.05. To compensate for unanticipated problems such as loss to follow up and equipment malformation, we included a dropout of 10% and therefore aimed to include a minimum of 40 screws (20 in each study arm). Considering the average number of cervical screws used per patient in our center, we aimed to include

10 patients in the study. The power calculation was performed using PASS software (NCSS, LLC, Kaysville, UT, USA).

All of the accuracy data were presented as descriptive statistics, expressed as median and interquartile range IQR values for non-normal distributed parameters. Non-inferiority was assessed by calculating the mean and 95% CI values for the difference between the 3DPG and CAS using a one-sample t-test and comparing the limits of the CI with predefined non-inferiority margins. The decision to reject the null hypothesis was made by determining whether the upper bound of the CI crossed the non-inferiority margin.

The final statistical analysis was performed using the SPSS Statistics program (SPSS Version 23.0 for Windows, IBM, N, Armonk, NY, USA).

#### **3. Results**

#### *3.1. Patient Characteristics*

Between June 2019 and December 2020, all of the consecutive patients referred for multi-level cervical and thoracic spine fixation were prospectively enrolled. A total of 10 patients were initially enrolled to meet the sample size calculations. A loss of CAS trajectory data due to storage failure, which exceeded the calculated dropout rate, resulted in the enrolment of three additional patients after approval by the institutional board had been renewed. Altogether, the number of patients suitable for the final analysis was 10. Because the ultimate number of screws inserted per patients turned out to be higher than expected, the final number of screws per study arm was 30.

The mean age of all patients was 56 years (range 16–82) and 5 of the 10 patients were women. The cohort presented with a spectrum of indications, including degenerative disease, osteoporotic fractures, rheumatoid arthritis, Klippel–Feil syndrome, and tumor.

#### *3.2. Descriptive Statistics*

The median entry point deviation was 1.8 mm (IQR: 1.0 mm–2.9 mm) in the cohort instrumented with 3DPG, and 1.8 mm (IQR: 1.0 mm–3.2 mm) in the cohort instrumented with CAS. The angular deviation was 5.7◦ (IQR: 2.9◦–9.1◦ ) in the cohort instrumented with 3DPG and 5.3◦ (IQR: 3.8◦–8.1◦ ) in the cohort instrumented with CAS (Figure 4).

#### *3.3. Non-Inferiority Assessment*

The 95% confidence interval (CI) for the difference in means between 3DPG and CAS (3DPG-CAS) was −1.01 mm to 0.49 mm. Therefore, the entry-point accuracy of 3DPG demonstrated non-inferiority relative to CAS, as the upper margin of the CI did not cross the predetermined non-inferiority margin of 1 mm (*p* < 0.05), which has been visualized in Figure 5. For angular accuracy, the 95% CI for the true difference between the means was −2.30◦ to 1.35◦ . Therefore, the angular accuracy of 3DPG was also found to be noninferior relative to CAS, as the upper margin of the CI did not cross the predetermined non-inferiority margin of 3◦ (*p* < 0.05) (Figure 5).

**Figure 5.** Graph displaying non-inferiority of 3DPG (test) relative to CAS (active control). The error bars demonstrate two-sided CIs, displaying both the lower and upper bounds of the CI. For both outcome measures, 3DPG was non-inferior relative to CAS, given that the entire CI was below the predetermined non-inferiority margins (∆). It should be noted that a smaller outcome value (less deviation) indicated a better outcome. Therefore, areas to the left indicated better outcomes, and areas to the right indicated worse outcomes.

#### **4. Discussion**

In this prospective randomized clinical trial (RCT), we compared the accuracy of spinal screw insertion using 3DPG and CAS. Our results showed that screw insertion accuracy achieved using 3DPG was similar and non-inferior to that obtained with CAS.

To the best of our knowledge, this RCT is the first to compare the accuracy of spinal screw insertion using 3DPG and CAS. Previous studies have compared 3DPG and freehand screw insertion, with their findings leading to a general consensus that the 3DPG technology significantly reduces the incidence of pedicle screw malpositioning [29,30]. Moreover, significant reductions in radiation exposure and in the time taken for screw implantation using this technology have been reported [28]. We only found one study by Fan et al. that compared groups of patients instrumented with CAS and 3DPG [31]. This prospective cohort study compared the use of robot-assisted pedicle screw insertion with 3DPG, CAS, and free-hand fluoroscopy-controlled screw insertion. The study demonstrated that the accuracy of "acceptable" screw placement in the 3DPG-guided group (95.52%) was slightly higher than that in the CAS group (90.60%), with no significant difference found between the two groups. These results suggest that both techniques yield similar degrees of accuracy; however, a systematic, randomized comparison was not performed in the above study. It is generally acknowledged that pedicle screw insertion has been substantially improved through the use of CAS technology compared with free-hand screw insertion. Hence, it was our belief that this accuracy standard should be the reference for novel navigational technologies such as 3DPG. Therefore, to become accepted as viable navigational tool and to optimize safety, an RCT should be conducted between 3DPG and CAS with the aim

of determining, at minimum, non-inferior screw insertion accuracy, as was done in the current study.

The results of our randomized study indicated similar degrees of accuracy for both techniques. Compared with the accuracy of CAS, that of 3DPG was non-inferior for both of the assessed parameters. In fact, the upper limits of our 95% CI were 0.50 mm and 1.35◦ , which were well below the respective non-inferiority limits of 1 mm and 3◦ . There was no indication of 3DPG being superior to CAS, as the CI upper limits were above zero. However, we believe that a sufficiently powered study could lead to a finding of the superiority of 3DPG in specific subgroups. In particular, the use of CAS in the highly mobile upper cervical spine may be associated with increasing errors when operating further away from the reference array, with surgical manipulation inducing slight realignments of vertebral levels [34]. As our study design and methodology for measuring accuracy differed considerably from those of Fan et al. (they did not report on quantitative differences between planned and actual screw directions), a comparison of the results of the two studies presents challenges [31]. Nevertheless, it can be concluded that again 3DPG evidences a high degree of accuracy and that the finding of the current study validates with more confidence that the accuracy of screw insertion using 3DPG is similar and non-inferior to that of CAS.

#### *4.1. Implications of the Study's Findings*

In light of the findings of this study, 3DPG can be considered to be an effective and accurate alternative navigational technology relative to CAS for cervical and thoracic spine fixation. It is important to point out that the results obtained using 3DPG cannot be pre-guaranteed when implementing the same technique and that our surgical teams underwent a learning curve, performing several cadaveric surgeries that served as training sessions. Furthermore, our comprehensive point-of-care 3D planning and printing facility has evolved over time, and we have acquired sufficient professional knowledge and competent staff with extensive training, enabling us to guarantee high-quality performance standards in full compliance with the EU medical device regulation (EU 2017/745). Centers that lack these facilities should be made aware of the high-quality standards that are required, or they should find suitable commercial partners for VSP and PSI design. However, given the technology's novelty, its commercial availability is currently limited.

Although 3D technology has great potential, the technique is not suitable for all cases. This particularly applies for trauma cases that require immediate fixation surgery. Since the here described technology needs pre-planning, manufacturing and sterilization of 3D-printed instrumentation, the whole process does at minimum require 3–4 days. Additionally, for minimal invasive approaches the current 3D technology is not yet suitable. However, minimal invasive screw insertion gains increased popularity in order to minimize tissue trauma. Three-dimensional printed guides for minimal invasive approaches remains largely and unexplored area. There are however few examples of which the SpineBox system is the most well-known [35]. Further studies are needed in this area to compare these new approaches with CAS technology.

#### *4.2. Study Limitations*

At our center, the CAS system was used in combination with a 3D fluoroscopy C-arm capable of acquiring an intraoperative cone beam CT scan. Current state-of-the-art CAS systems are, however, often installed with the newer O-arms, which potentially provide enhanced image quality. Although both systems offer high levels of accuracy, there is some evidence that the use of O-arms with CAS improves the level of accuracy [36]. Therefore, our results are not generalizable for all CAS set-ups. Consequently, future studies that entail direct comparisons of 3DPG and O-arm-equipped CAS are warranted.

Within current 3D fluoroscopy CAS systems, the screw trajectory is defined 'on the spot', and not according to a predefined screw plan. Therefore, in the present study, we could not measure the accuracy of CAS accuracy with respect to a preoperative plan;

instead, we measured its accuracy using the saved, intraoperative acquired trajectories and CT image data. It is possible that the surgeon considered the trajectory associated with the acquired hole to be sufficient but not optimal. If accuracy is defined according to the extent of deviation from the most optimal trajectory, the study could entail a slight overestimation of the actual accuracy of CAS. This again exposes the major advantage of 3DPG; the optimal trajectory can be selected and considered preoperatively instead of being defined during the time-constrained and intensive period of surgery.

A consideration of more clinical variables, such as infection rates, intraoperative blood loss, duration of the operation, and radiation dosage, was beyond the scope of the current study design. Additionally, for analyzing subgroups with different screw techniques (lateral mass or pedicle) the study was insufficiently powered. Consequently, these variables were not included in the comparison of the two techniques. Therefore, prospective RCTs with larger sample sizes are still required for a comprehensive assessment of these two techniques. It is likely that higher-powered clinical trials are necessary to validate our findings with a higher degree of confidence and to evaluate whether the inclusion of other clinical parameters, such as surgery time (total/per screw), support the use of one or the other technique.

Our analysis of both end points was performed on a per-protocol basis. During the study, there was a drop out of 3 patients due to data loss, which should be prevented in future studies by multiple copies or cloud storage. In addition, some of vertebral levels that we had planned to include in the fusion were not instrumented in cases entailing a sufficient amount of fixation. Because these screws were not inserted, they could not be evaluated, making an intention-to-treat analysis impossible to perform. Therefore, only planned screws that were actually in situ and visible as postoperative image data were included in the analysis. Furthermore, in our opinion, an intention-to-treat analysis was not appropriate for the current study design because randomization pertained to the level of patient side rather than to patients. An intension-to-treat analysis would, therefore, only be necessary when for example the assigned sided were revered, which is something that did not occur in this study.

Within-patient clustering was not considered in this study. In the future, variance and thus confidence intervals need to be inflated to account for the effect of within-patient clustering for two main reasons. Firstly, because screws within patients are more likely to resemble each other than screws across different patients (violating the independence assumption) and secondly because treatment is assigned on the level of patient side, not on the level of the screw. Therefore, to accurately reflect dependencies in the data, clusterrandomized design (whereby patients are the clusters) will be used to appropriately power future studies.

#### **5. Conclusions**

Although the benefits of 3DPG and its accuracy have been repeatedly demonstrated, this is the first randomized controlled study that compares 3DPG with CAS. The results of this RCT indicate that the accuracy of spinal screw insertion using 3DPG is similar and non-inferior to that obtained with CAS. Future higher powered comparative studies should focus on studying specific subgroups of vertebral levels that have the potential to demonstrate superiority.

**Author Contributions:** Conceptualization, P.A.J.P., J.M.A.K., D.L.M.O., R.A.V., G.R., M.H.C., J.K. and R.J.M.G.; data curation, P.A.J.P. and J.M.A.K.; formal analysis, P.A.J.P., K.T. and J.M.A.K.; investigation, P.A.J.P., J.M.A.K., K.T., D.L.M.O., R.A.V., G.R., M.H.C., J.K. and R.J.M.G.; methodology, P.A.J.P., J.M.A.K., J.K. and R.J.M.G.; project administration, P.A.J.P.; resources, P.A.J.P. and J.K.; software, P.A.J.P. and J.K.; supervision, J.M.A.K., J.K. and R.J.M.G.; validation, P.A.J.P. and J.K.; visualization, P.A.J.P.; writing—original draft, P.A.J.P.; writing—review and editing, J.M.A.K., K.T., D.L.M.O., R.A.V., G.R., M.H.C., J.K. and R.J.M.G. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board (or Ethics Committee) of University Medical Center Groningen (ref. no. M19.229543 and date of approval 3-4-2019).

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study. Written informed consent has been obtained from the patients to publish this paper.

**Data Availability Statement:** The authors declare that the data supporting the findings of this study are available within the paper.

**Acknowledgments:** We thank Diane Steenks for her administrative support in executing this study.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

