Scoping Review of Machine Learning and Patient-Reported Outcomes in Spine Surgery

Christian Quinones; Deepak Kumbhare; Bharat Guthikonda; Stanley Hoang

doi:10.3390/bioengineering12020125

,

and

Department of Neurosurgery, Louisiana State University Health Shreveport, Shreveport, LA 71103, USA

^*

Author to whom correspondence should be addressed.

Bioengineering2025, 12(2), 125;https://doi.org/10.3390/bioengineering12020125

This article belongs to the Special Issue Artificial Intelligence and Machine Learning in Spine Research

Version Notes

Order Reprints

Review Reports

Abstract

Machine learning is an evolving branch of artificial intelligence that is being applied in neurosurgical research. In spine surgery, machine learning has been used for radiographic characterization of cranial and spinal pathology and in predicting postoperative outcomes such as complications, functional recovery, and pain relief. A relevant application is the investigation of patient-reported outcome measures (PROMs) after spine surgery. Although a multitude of PROMs have been described and validated, there is currently no consensus regarding which questionnaires should be utilized. Additionally, studies have reported varying degrees of accuracy in predicting patient outcomes based on questionnaire responses. PROMs currently lack standardization, which renders them difficult to compare across studies. The purpose of this manuscript is to identify applications of machine learning to predict PROMs after spine surgery.

Keywords:

artificial intelligence; machine learning; patient-reported outcomes; spine surgery; outcome measures; literature review; health informatics

1. Introduction

Research in spine surgery has been impacted by the recent rise in artificial intelligence (AI). Machine learning (ML) is a subset of AI that functions to predict outputs based on given inputs. In medical research, input data may include any combination of the following: patient demographics, spinal pathology, imaging characteristics, surgical characteristics, comorbidities, and patient-reported outcome measures (PROMs) []. Examples of outputs are complications, functional outcomes, surgical success, hospitalization characteristics, readmission rates, reoperation rates, survival prediction, cost prediction, and rehabilitation needs. One outcome in which ML is particularly applicable is in predicting PROMs after spine surgery.

When being evaluated for spine surgery, an important consideration is the degree of improvement that a patient experiences after surgical intervention. This question can be answered by comparing preoperative and postoperative PROMs. The original PROMs developed for use in spine surgery are currently referred to as “legacy outcome measures” and include the Oswestry Disability Index (ODI), Neck Disability Index (NDI) [], Visual Analog Scale (VAS), Short Form Health Survey (SF-36 or SF-12), Japanese Orthopaedic Association (JOA) score, Roland-Morris Disability Questionnaire (RMDQ), EuroQol-5D (EQ-5D), and Scoliosis Research Society (SRS) questionnaire []. These surveys provided the foundation for defining patient-oriented, clinically significant outcomes that assess quality of life after spine surgery []. To quantify a standard for expected PROM improvements, clinicians defined the Minimal Clinically Important Difference (MCID) for these PROMs []. Due to variations in spinal pathology, surgical interventions, patient demographics, and the intrinsic disadvantages of PROMs such as time to completion, there has been a lack of consensus on which PROMs to utilize. A 2022 literature review reported the presence of 206 unique spine-specific PROMs []. To address this, the National Institute of Health developed the Patient-Reported Outcomes Measurement Information System (PROMIS) in an attempt to standardize PROMs and simplify their administration [].

The decision to proceed with spine surgery is often complex, largely because there are no definitive guidelines or universal indications for when surgery is appropriate. The use of ML to accurately predict patient outcomes grants surgeons another tool to more confidently advise patients on surgical outcomes []. The purpose of this manuscript is to describe the extent to which ML has been used to predict PROMs after spine surgery.

2. Materials and Methods

A scoping review of the literature per the Preferred Reporting Items for Systematic Reviews and Meta-Analyses for Scoping Reviews (PRISMA-ScR) guidelines [] was carried out in Web of Science, PubMed, and EMBASE on October 8, 2024. A combination of MeSH terms and keywords related to patient-reported outcomes and spine surgery were used. The search criteria for PubMed were as follows: (“Machine Learning” [MeSH] OR “Artificial Intelligence”) AND (“Patient Reported Outcome Measures” [MeSH] OR “Patient-reported outcomes”). The search criteria for Web of Science were as follows: (“Machine Learning” OR “Artificial Intelligence”) AND (“Patient Reported Outcome Measures” OR “Patient-reported outcomes” OR “PROMs” OR “Quality of Life” OR “Health Outcomes”) AND (“Spine” OR “Spinal Surgery”). The search criteria for EMBASE were as follows: (‘machine learning’/exp OR ‘machine learning’ OR ‘artificial intelligence’/exp OR ‘artificial intelligence’) AND (‘patient reported outcome’/exp OR ‘patient reported outcome’ OR ‘quality of life’/exp OR ‘quality of life’ OR ‘patient-reported outcomes’ OR ‘proms’ OR ‘health outcomes’/exp OR ‘health outcomes’) AND (‘spine’/exp OR ‘spine’ OR ‘spinal surgery’/exp OR ‘spinal surgery’ OR ‘spine surgery’/exp OR ‘spine surgery’) AND [english]/lim.

English articles published from 1994 to 2024 were selected. One researcher (C.Q.) assessed the manuscripts for eligibility under the supervision of another researcher (D.K.). In cases of disagreements or uncertainties requiring further clarification, the senior author (S.H.) was consulted and a consensus was reached during research team discussions. The inclusion criteria consisted of studies that utilized ML tools to predict postoperative PROMs for patients who underwent spine surgery. Studies that did not employ ML to predict PROMs were excluded. Data extracted included the ML method used, spine pathology, number of patients, features used for model prediction, and ML performance.

3. Results

3.1. Search Results

The initial search yielded 648 articles; 60 repeats were removed, resulting in 588 unique articles for screening. Of the 37 articles that met the initial screening criteria, twelve non-surgical studies were excluded. The remaining 25 articles were further assessed for eligibility, with 3 excluded for not predicting postoperative PROMs [,,]. A total of 22 articles were included in the qualitative synthesis (Figure 1).

Figure 1. Literature search strategy.

3.2. Study Details

Seven articles predicted outcomes in cervical spine pathologies [,,,,,,]. Three articles predicted outcomes for thoracolumbar pathologies [,,]. Eleven articles predicted outcomes for lumbar spine pathology [,,,,,,,,,,]. One study predicted outcomes for all levels of spinal pathology []. The postoperative timeline for PROM prediction ranged from 6 weeks to 24 months (Table 1).

Table 1. Study characteristics.

A total of twenty-one PROMs were reported. Seven articles reported the ODI [,,,,,,], and four articles reported the VAS [,,,], mJOA [,,,], and numeric rating scale (NRS) [,,,]. Three reported the JOA [,,], NDI [,,], core outcome measure index (COMI) [,,], and SF-36 [,,]; two articles reported the EQ-5D [,] and SRS [,]; and one article reported the EuroQol [], Physical Component Summary (PCS) [], PROMIS-PF [], SF-6D [], Mental Component Summary (MCS) [], Mental Disability Index (MDI) [], Disabilities of the Arm, Shoulder, and Hand (DASH) [], North American Spine Society (NASS) [], Japanese Orthopedic Association Back Pain Evaluation Questionnaire (JOABPEQ) [], neck pain [], and pain symptoms specific to quality of life, social disability, and work disability []. Table 2 provides a categorical breakdown and brief description of PROMs.

Table 2. Description of common patient-reported outcome measures.

The features used for model prediction were demographics in all but one study []. Surgical characteristics were used in ten studies [,,,,,,,,,]. Spinal pathology characteristics were used in ten studies [,,,,,,,,,]. American Society of Anesthesiologist (ASA) classification was used in six studies [,,,,,]. Physical exam findings were used in seven studies [,,,,,,]. Past medical history (including surgical history) was used in five studies [,,,,]. Preoperative opioid use was used in four studies [,,,]. Hospitalization details were used in three studies [,,]. Social history (including employment details) was used in two studies [,]. One study used geographic details [].

Sixty unique ML models were used in the relevant studies. The most frequently used model was support vector machine (SVM), which was used in eight studies [,,,,,,,]; logistic regression (LR), which was used in seven studies [,,,,,,]; and RF, which was used in six studies [,,,,,]. Decision tree was used in four studies [,,,], and elastic net (EN) was used in three studies [,,]. Least absolute shrinkage and selection operator (LASSO) regression was used in three studies [,,], and neural network was used in three studies [,,]. The remaining ML models included Bayesian generalized linear models (BGLMs), boosted LR, extra trees, extreme gradient boosted trees, regression tree, Tree—AS, boosting, chi-squared, deep learning, dimensionality reduction factor analysis, EN penalized LR, EN regularization, EN, generalized additive models, generalized boosted, generalized boosted machines, generalized linear mixed model, k-nearest neighbors, linear—AS, multilayer perceptron, multivariable adaptive regression splines, multivariate linear regression, partial least squares, principal component analysis, ridge regression, simple BGLMs, single-layer artificial neural networks, stepwise regression, and stochastic gradient boosting. Model performance was most frequently reported as Area Under the Curve (AUC), which was reported in sixteen studies [,,,,,,,,,,,,,,,]. Model performance was also reported as the mean absolute error (MAE) in three studies [,,]. The remaining performance measures included mean bootstrapped R2 [], MMA [], and coefficients [].

3.3. Key Results

Park et al. best predicted 3- and 24-month VAS after cervical spine decompression with LR with an AUC of 0.762 and 0.773, respectively []. Pedersen et al. used seven ML models to predict EQ-5D, ODI, VAS leg pain (LP), VAS back pain (BP), and return to work after lumbar spine surgery with a mean AUC of 0.82, 0.75, 0.73, 0.81, and 0.84, respectively []. Ve et al. employed a deep learning model to predict the ODI with an AUC of 0.84 and NRS BP improvement with an AUC of 0.9 []. Berjano et al. predicted postoperative ODI with a combination of preoperative ODI, SF-36 Physical Component Summary (PCS), and COMI Back with an AUC of 0.808 []. Halicka et al. used LR to predict an AUC of 0.63, 0.72, and 0.68 for COMI, BP, and LP, respectively []. Karhade et al. utilized LR, neural networks, and EN penalized LR to predict PROMIS physical function, pain interference, and pain intensity, achieving AUCs of 0.75, 0.71, and 0.71, respectively, with the EN penalized LR achieving an AUC of 0.69 []. Merali et al. used random forest random forest (RF) to predict SF-6D and mJOA with an of AUC of 0.85, 0.83, and 0.87 at 6, 12, and 24 months, respectively []. Rigoard et al. found that changes in the Modified Clinical Response Index were the most accurate indicator of Patient Global Impression of Change, with an AUC of 0.853 []. This was higher compared to the AUC for changes in the Hospital Anxiety and Depression Scale (HADS) (0.780), ODI score (0.737), Numerical Pain Rating Scale (NPRS) (0.704), EQ-5D index (0.698), and Pain Mapping Intensity score (0.672). Grob et al. used EN regularization to predict ODI, NRS BP, and LP with an AUC of 0.70, 0.72, and 0.70, respectively []. Zhang et al. used SVM to predict SF-36 PCS and Mental Component Summary (MCS) with an AUC of 86.4 and 89.8, respectively []. Gupta et al. used gradient boosting to predict an MAE of 0.47 and 0.55 for SRS-pain prediction and SRS self-image prediction, respectively []. Yagi et al. used an assemblage of the top five performing algorithms to predict JOABPEQ and VAS scores following lumbar spine surgery, with MAE values ranging from 9.3 to 16.5 []. Muller et al. used LASSO to predict COMI subdomains for back and neck pain with an MAE of 2.1 and 1.8, respectively []. Khan et al. used generalized boosted models and multivariable adaptive regression models to obtain predictions with an AUC of 0.77 and 0.78 for 24-month postoperative MCS and PCS, respectively []. Finkelstein et al. used LASSO regression to predict NRS after lumbar surgery with a mean bootstrapped R2 of 0.12 []. Liew et al. was the only study evaluating cervical radiculopathy []. This same study used four ML models to predict the NDI and EQ-5D. In this study, stepwise regression yielded the highest accuracy for the NDI, EQ-5D, and neck pain 12 months after cervical spine surgery. Siccoli et al. employed eight ML models to predict the ODI and NRS scores for BP and LP []. The 6-week postoperative AUC values were as follows: ODI 0.75, NRS LP 0.79, and NRS-BP 0.92 (boosted trees model). At 12 months postoperatively, the AUC values were ODI 0.68, NRS-LP 0.72, and NRS-BP 0.79. Ames et al. used seven ML models to predict individual SRS-22R questions, achieving AUROC values for individual SRS-22R questions as high as 86.9% (extreme gradient boosting tree) []. Staartjes et al. used EN regularization to predict the ODI and COMI, achieving an AUC of 0.67 []. The model yielded an AUC of 0.72 for BP and 0.64 for LP. Khan et al. utilized a polynomial SVM model to predict an AUC of 0.834 []. Khor et al. applied three binary regression models to predict outcomes, achieving the following AUC values: ODI 0.66, BP 0.79, and LP 0.69 []. Hoffman et al. reported a mean absolute accuracy (MAA) of 0.0283 with the use of support vector regression (SVR) [].

4. Discussion

Predicting clinically relevant outcomes after spine surgery has been increasingly performed with patient-reported outcomes []. These questionnaires evaluate subjective and objective measures that aid surgeons in measuring a patient’s quality of life before and after surgical intervention, ultimately allowing for a better understanding of the physical and psychological burden of spinal pathology. By identifying subtle patterns in pathology, patient characteristics, and populations, ML has the potential to predict PROMs after spine surgery. There has been a significant volume of studies describing PROMs, yet the clinical relevance has yet to be determined due to the significant degree of heterogeneity []. To improve the consistency and completeness of prediction model studies, the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) was devised. This TRIPOD criteria serves as a set of evidence-based guidelines designed to improve the consistency and completeness of prediction model reporting []. Only eleven studies [,,,,,,,,,] in the review claimed to adhere to the TRIPOD criteria.

PROMs often assess pain, functional status, and other relevant factors. Consistent with past literature reviews [], the mJOA, ODI, and SRS-22 were the most frequently predicted PROMs for cervical, lumbar, and spinal deformity pathologies, respectively. This fact highlights the emphasis placed on a patient’s physical function. For assessment of pain, tools like the VAS and NRS are commonly used to measure back and leg pain. Although both measure pain, some studies have found the VAS assessment to be more useful. For example, Bielewicz et al. found that VAS scores decreased to a greater degree than NRS scores three months after surgery [], attributing the poor reproducibility of the NRS to its less detailed incremental changes [].

The features used to predict PROMs included demographics, surgery characteristics, preoperative PROMs, spinal pathology characteristics, mental health evaluations, employment details, social history, ASA classification, comorbidities, imaging findings, fine motor function, hospital characteristics, and surgeon characteristics. Several physical exam findings have been identified as predictors of functional improvement after surgery []. For example, upper motor neuron signs have been associated with a decreased likelihood of recovery after lumbar spine surgery []. In addition to objective clinical findings, Finkelstein et al. found that cognitive factors accounted for 40% of the variance in PROMs after spine surgery []. This finding is consistent with a randomized control trial reporting that lumbar spine surgery patients who participated in cognitive behavioral-based physical therapy had greater improvements in pain and disability compared to those who received physical therapy-related educational training []. Preoperative opioid use has been identified as another factor that affects patient-reported outcomes after spine surgery. Given that unmanageable pain is often a primary reason for surgical intervention [], this variable should be further investigated for its role in predicting PROMs.

In this report, the AUC was the most frequently reported performance metric. The AUC can be thought of as the overall performance of an ML model with values ranging from 0 to 1. Values closer to 1 indicate better performance []. The study reporting the highest AUC for cervical spine pathology was that by Khan et al., who used a polynomial SVM to predict mJOA with an AUC of 0.834 []. Siccoli et al. applied boosted trees to predict NRS-BP after lumbar spine surgery, achieving an AUC of 0.92 []. For adult spinal deformity, Ames et al. used extreme gradient boosting trees to predict individual SRS-22R questions, with AUC values as high as 0.869 []. Despite successful ML model performance, the clinical applicability of these models is limited due to the complexity of shared decision making between a patient and the provider. A review by Christodoulou et al. evaluated 71 studies investigating clinical prediction models and found no evidence of superior ML performance over LR [].

A primary limitation of this review was the exclusion of articles not containing the term “machine learning” in the abstract or title. This may have excluded studies that employed ML to predict PROMs but did not explicitly mention “ML” in their terminology. As a result, this introduced a potential selection bias and reduced the overall comprehensiveness of the review. Another contributing limitation was the minimal volume of high-quality evidence. Due to the negligible amount of evidence and large degree of heterogeneity amongst studies, a comprehensive systematic review or meta-analysis was unable to be performed.

5. Conclusions

PROMs continue to be a valuable tool for assessing the impact of spine pathology on physical and mental health, but surgeon expertise remains pivotal when counseling patients. Providers should be aware of the evolving application of these technologies in both clinical and academic pursuits. Although ML models have the potential to accurately predict PROMs, their clinical applicability is severely limited by the variation in ML models, spinal pathology, input variables, and output variables across studies. Surgeons and researchers should collaborate to establish standardized outcome measures and evaluation metrics. This joint effort would harness the predictive potential of ML to predict postoperative PROMs.

Author Contributions

C.Q.: Contributed to Data curation, Formal analysis, Methodology, Writing—original draft, and Writing—review and editing. D.K.: Contributed to Supervision and Writing—review and editing. B.G.: Contributed to Supervision, Validation, and Writing—review and editing. S.H.: Contributed to Conceptualization, Formal analysis, Investigation, Methodology, Project administration, Supervision, Validation, Visualization, and Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tragaris, T.; Benetos, I.S.; Vlamis, J.; Pneumaticos, S.; Tragaris, T.; Benetos, I.S.; Vlamis, J.; Pneumaticos, S.G. Machine Learning Applications in Spine Surgery. Cureus 2023, 15, e48078. [Google Scholar] [CrossRef]
Vernon, H.; Mior, S. The Neck Disability Index: A Study of Reliability and Validity. J. Manip. Physiol. Ther. 1991, 14, 409–415. [Google Scholar]
McCormick, J.D.; Werner, B.C.; Shimer, A.L. Patient-Reported Outcome Measures in Spine Surgery. JAAOS J. Am. Acad. Orthop. Surg. 2013, 21, 99. [Google Scholar] [CrossRef] [PubMed]
Franceschini, M.; Boffa, A.; Pignotti, E.; Andriolo, L.; Zaffagnini, S.; Filardo, G. The Minimal Clinically Important Difference Changes Greatly Based on the Different Calculation Methods. Am. J. Sports Med. 2023, 51, 1067–1073. [Google Scholar] [CrossRef]
Jaeschke, R.; Singer, J.; Guyatt, G.H. Measurement of Health Status. Ascertaining the Minimal Clinically Important Difference. Control Clin. Trials 1989, 10, 407–415. [Google Scholar] [CrossRef] [PubMed]
Beighley, A.; Zhang, A.; Huang, B.; Carr, C.; Mathkour, M.; Werner, C.; Scullen, T.; Kilgore, M.D.; Maulucci, C.M.; Dallapiazza, R.F.; et al. Patient-Reported Outcome Measures in Spine Surgery: A Systematic Review. J. Craniovertebral Junction Spine 2022, 13, 378–389. [Google Scholar] [CrossRef]
Intro to PROMIS. Available online: https://www.healthmeasures.net/explore-measurement-systems/promis/intro-to-promis (accessed on 17 October 2024).
Young, R.R. Emerging Role of Artificial Intelligence and Big Data in Spine Care. Int. J. Spine Surg. 2023, 17, S3–S10. [Google Scholar] [CrossRef] [PubMed]
Tricco, A.C.; Lillie, E.; Zarin, W.; O’Brien, K.K.; Colquhoun, H.; Levac, D.; Moher, D.; Peters, M.D.J.; Horsley, T.; Weeks, L.; et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann. Intern. Med. 2018, 169, 467–473. [Google Scholar] [CrossRef] [PubMed]
Moustafa, I.M.; Ozsahin, D.U.; Mustapha, M.T.; Ahbouch, A.; Oakley, P.A.; Harrison, D.E. Utilizing Machine Learning to Predict Post-Treatment Outcomes in Chronic Non-Specific Neck Pain Patients Undergoing Cervical Extension Traction. Sci. Rep. 2024, 14, 11781. [Google Scholar] [CrossRef] [PubMed]
Janssen, E.R.; Osong, B.; van Soest, J.; Dekker, A.; van Meeteren, N.L.; Willems, P.C.; Punt, I.M. Exploring Associations of Preoperative Physical Performance With Postoperative Outcomes After Lumbar Spinal Fusion: A Machine Learning Approach. Arch. Phys. Med. Rehabil. 2021, 102, 1324–1330.e3. [Google Scholar] [CrossRef]
Wondra, J.P.I.; Kelly, M.P.; Greenberg, J.; Yanik, E.L.; Ames, C.P.; Pellise, F.; Vila-Casademunt, A.; Smith, J.S.; Bess, S.; Shaffrey, C.I.; et al. Validation of Adult Spinal Deformity Surgical Outcome Prediction Tools in Adult Symptomatic Lumbar Scoliosis. Spine 2023, 48, 21. [Google Scholar] [CrossRef]
Durand, W.M.; Lafage, R.; Hamilton, D.K.; Passias, P.G.; Kim, H.J.; Protopsaltis, T.; Lafage, V.; Smith, J.S.; Shaffrey, C.; Gupta, M.; et al. Artificial Intelligence Clustering of Adult Spinal Deformity Sagittal Plane Morphology Predicts Surgical Characteristics, Alignment, and Outcomes. Eur. Spine J. 2021, 30, 2157–2166. [Google Scholar] [CrossRef]
Park, C.; Mummaneni, P.V.; Gottfried, O.N.; Shaffrey, C.I.; Tang, A.J.; Bisson, E.F.; Asher, A.L.; Coric, D.; Potts, E.A.; Foley, K.T.; et al. Which Supervised Machine Learning Algorithm Can Best Predict Achievement of Minimum Clinically Important Difference in Neck Pain after Surgery in Patients with Cervical Myelopathy? A QOD Study. Neurosurg. Focus 2023, 54, E5. [Google Scholar] [CrossRef]
Merali, Z.G.; Witiw, C.D.; Badhiwala, J.H.; Wilson, J.R.; Fehlings, M.G. Using a Machine Learning Approach to Predict Outcome after Surgery for Degenerative Cervical Myelopathy. PLoS ONE 2019, 14, e0215133. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.K.; Jayasekera, D.; Javeed, S.; Greenberg, J.K.; Blum, J.; Dibble, C.F.; Sun, P.; Song, S.-K.; Ray, W.Z. Diffusion Basis Spectrum Imaging Predicts Long-Term Clinical Outcomes Following Surgery in Cervical Spondylotic Myelopathy. Spine J. 2023, 23, 504–512. [Google Scholar] [CrossRef]
Khan, O.; Badhiwala, J.H.; Witiw, C.D.; Wilson, J.R.; Fehlings, M.G. Machine Learning Algorithms for Prediction of Health-Related Quality-of-Life after Surgery for Mild Degenerative Cervical Myelopathy. Spine J. 2021, 21, 1659–1669. [Google Scholar] [CrossRef] [PubMed]
Liew, B.X.W.; Peolsson, A.; Rugamer, D.; Wibault, J.; Löfgren, H.; Dedering, A.; Zsigmond, P.; Falla, D. Clinical Predictive Modelling of Post-Surgical Recovery in Individuals with Cervical Radiculopathy: A Machine Learning Approach. Sci. Rep. 2020, 10, 16782. [Google Scholar] [CrossRef]
Khan, O.; Badhiwala, J.H.; Akbar, M.A.; Fehlings, M.G. Prediction of Worse Functional Status After Surgery for Degenerative Cervical Myelopathy: A Machine Learning Approach. Neurosurgery 2021, 88, 584. [Google Scholar] [CrossRef]
Hoffman, H.; Lee, S.I.; Garst, J.H.; Lu, D.S.; Li, C.H.; Nagasawa, D.T.; Ghalehsari, N.; Jahanforouz, N.; Razaghy, M.; Espinal, M.; et al. Use of Multivariate Linear Regression and Support Vector Regression to Predict Functional Outcome after Surgery for Cervical Spondylotic Myelopathy. J. Clin. Neurosci. 2015, 22, 1444–1449. [Google Scholar] [CrossRef]
Grob, A.; Rohr, J.; Stumpo, V.; Vieli, M.; Ciobanu-Caraus, O.; Ricciardi, L.; Maldaner, N.; Raco, A.; Miscusi, M.; Perna, A.; et al. Multicenter External Validation of Prediction Models for Clinical Outcomes after Spinal Fusion for Lumbar Degenerative Disease. Eur. Spine J. 2024, 33, 3534–3544. [Google Scholar] [CrossRef] [PubMed]
Gupta, A.; Oh, I.Y.; Kim, S.; Marks, M.C.; Payne, P.R.O.; Ames, C.P.; Pellise, F.; Pahys, J.M.; Fletcher, N.D.; Newton, P.O.; et al. Machine Learning for Benchmarking Adolescent Idiopathic Scoliosis Surgery Outcomes. Spine 2023, 48, 1138. [Google Scholar] [CrossRef]
Ames, C.P.; Smith, J.S.; Pellisé, F.; Kelly, M.; Gum, J.L.; Alanay, A.; Acaroğlu, E.; Pérez-Grueso, F.J.S.; Kleinstück, F.S.; Obeid, I.; et al. Development of Predictive Models for All Individual Questions of SRS-22R after Adult Spinal Deformity Surgery: A Step toward Individualized Medicine. Eur. Spine J. 2019, 28, 1998–2011. [Google Scholar] [CrossRef] [PubMed]
Staartjes, V.E.; de Wispelaere, M.P.; Vandertop, W.P.; Schröder, M.L. Deep Learning-Based Preoperative Predictive Analytics for Patient-Reported Outcomes Following Lumbar Discectomy: Feasibility of Center-Specific Modeling. Spine J. Off. J. N. Am. Spine Soc. 2019, 19, 853–861. [Google Scholar] [CrossRef] [PubMed]
Khor, S.; Lavallee, D.; Cizik, A.M.; Bellabarba, C.; Chapman, J.R.; Howe, C.R.; Lu, D.; Mohit, A.A.; Oskouian, R.J.; Roh, J.R.; et al. Development and Validation of a Prediction Model for Pain and Functional Outcomes After Lumbar Spine Surgery. JAMA Surg. 2018, 153, 634–642. [Google Scholar] [CrossRef]
Pedersen, C.F.; Andersen, M.Ø.; Carreon, L.Y.; Eiskjær, S. Applied Machine Learning for Spine Surgeons: Predicting Outcome for Patients Undergoing Treatment for Lumbar Disc Herniation Using PRO Data. Glob. Spine J. 2022, 12, 866–876. [Google Scholar] [CrossRef] [PubMed]
Berjano, P.; Langella, F.; Ventriglia, L.; Compagnone, D.; Barletta, P.; Huber, D.; Mangili, F.; Licandro, G.; Galbusera, F.; Cina, A.; et al. The Influence of Baseline Clinical Status and Surgical Strategy on Early Good to Excellent Result in Spinal Lumbar Arthrodesis: A Machine Learning Approach. J. Pers. Med. 2021, 11, 1377. [Google Scholar] [CrossRef] [PubMed]
Halicka, M.; Wilby, M.; Duarte, R.; Brown, C. Predicting Patient-Reported Outcomes Following Lumbar Spine Surgery: Development and External Validation of Multivariable Prediction Models. BMC Musculoskelet. Disord. 2023, 24, 333. [Google Scholar] [CrossRef] [PubMed]
Karhade, A.; Fogel, H.A.; Cha, T.D.; Hershman, S.H.; Doorly, T.P.; Kang, J.D.; Bono, C.M.; Harris, M.B.; Schwab, J.H.; Tobert, D.G. Development of Prediction Models for Clinically Meaningful Improvement in PROMIS Scores after Lumbar Decompression. Spine J. 2021, 21, 397–404. [Google Scholar] [CrossRef] [PubMed]
Yagi, M.; Michikawa, T.; Yamamoto, T.; Iga, T.; Ogura, Y.; Tachibana, A.; Miyamoto, A.; Suzuki, S.; Nori, S.; Takahashi, Y.; et al. Development and Validation of Machine Learning-Based Predictive Model for Clinical Outcome of Decompression Surgery for Lumbar Spinal Canal Stenosis. Spine J. 2022, 22, 1768–1777. [Google Scholar] [CrossRef] [PubMed]
Finkelstein, J.A.; Stark, R.B.; Lee, J.; Schwartz, C.E. Patient Factors That Matter in Predicting Spine Surgery Outcomes: A Machine Learning Approach. J. Neurosurg. Spine 2021, 35, 127–136. [Google Scholar] [CrossRef]
Siccoli, A.; de Wispelaere, M.P.; Schröder, M.L.; Staartjes, V.E. Machine Learning–Based Preoperative Predictive Analytics for Lumbar Spinal Stenosis. Neurosurg. Focus 2019, 46, E5. [Google Scholar] [CrossRef] [PubMed]
Staartjes, V.E.; Stumpo, V.; Ricciardi, L.; Maldaner, N.; Eversdijk, H.A.J.; Vieli, M.; Ciobanu-Caraus, O.; Raco, A.; Miscusi, M.; Perna, A.; et al. FUSE-ML: Development and External Validation of a Clinical Prediction Model for Mid-Term Outcomes after Lumbar Spinal Fusion for Degenerative Disease. Eur. Spine J. 2022, 31, 2629–2638. [Google Scholar] [CrossRef] [PubMed]
Müller, D.; Haschtmann, D.; Fekete, T.F.; Kleinstück, F.; Reitmeir, R.; Loibl, M.; O’Riordan, D.; Porchet, F.; Jeszenszky, D.; Mannion, A.F. Development of a Machine-Learning Based Model for Predicting Multidimensional Outcome after Surgery for Degenerative Disorders of the Spine. Eur. Spine J. 2022, 31, 2125–2136. [Google Scholar] [CrossRef] [PubMed]
Rigoard, P.; Ounajim, A.; Goudman, L.; Louis, P.-Y.; Slaoui, Y.; Roulaud, M.; Naiditch, N.; Bouche, B.; Page, P.; Lorgeoux, B.; et al. A Novel Multi-Dimensional Clinical Response Index Dedicated to Improving Global Assessment of Pain in Patients with Persistent Spinal Pain Syndrome after Spinal Surgery, Based on a Real-Life Prospective Multicentric Study (PREDIBACK) and Machine Learning Techniques. J. Clin. Med. 2021, 10, 4910. [Google Scholar] [CrossRef] [PubMed]
Cooper, M.E.; Torre-Healy, L.A.; Alentado, V.J.; Cho, S.; Steinmetz, M.P.; Benzel, E.C.; Mroz, T.E. Heterogeneity of Reporting Outcomes in the Spine Surgery Literature. Clin. Spine Surg. 2018, 31, E221–E229. [Google Scholar] [CrossRef]
Collins, G.S.; Moons, K.G.M.; Dhiman, P.; Riley, R.D.; Beam, A.L.; Calster, B.V.; Ghassemi, M.; Liu, X.; Reitsma, J.B.; van Smeden, M.; et al. TRIPOD+AI Statement: Updated Guidance for Reporting Clinical Prediction Models That Use Regression or Machine Learning Methods. BMJ 2024, 385, e078378. [Google Scholar] [CrossRef]
Bielewicz, J.; Daniluk, B.; Kamieniak, P. VAS and NRS, Same or Different? Are Visual Analog Scale Values and Numerical Rating Scale Equally Viable Tools for Assessing Patients after Microdiscectomy? Pain Res. Manag. 2022, 2022, 5337483. [Google Scholar] [CrossRef]
Archer, K.R.; Devin, C.J.; Vanston, S.W.; Koyama, T.; Phillips, S.E.; Mathis, S.L.; George, S.Z.; McGirt, M.J.; Spengler, D.M.; Aaronson, O.S.; et al. Cognitive-Behavioral-Based Physical Therapy for Patients With Chronic Pain Undergoing Lumbar Spine Surgery: A Randomized Controlled Trial. J. Pain 2016, 17, 76–89. [Google Scholar] [CrossRef]
Khan, A.S.R.; Mattei, T.A.; Mercier, P.J.; Cloney, M.; Dahdaleh, N.S.; Koski, T.R.; El Tecle, N.E. Outcome Reporting in Spine Surgery: A Review of Historical and Emerging Trends. World Neurosurg. 2023, 179, 88–98. [Google Scholar] [CrossRef]
Christodoulou, E.; Ma, J.; Collins, G.S.; Steyerberg, E.W.; Verbakel, J.Y.; Van Calster, B. A Systematic Review Shows No Performance Benefit of Machine Learning over Logistic Regression for Clinical Prediction Models. J. Clin. Epidemiol. 2019, 110, 12–22. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Literature search strategy.

Table 1. Study characteristics.

Article	Pathology	# of Pts	Predicted PROMs	ML Models	Input Features	PPT (Months)	Results (AUC) *
Liew et al. []	Cervical	193	NDI, EQ5D, NP	Stepwise regression, LASSO, boosting, MARS	Demographics, PE, PROMs	12	Not reported
Park et al. []	CSM	535	VAS-NP	LR, SVM, DT, RF, extra trees, Gaussian naïve Bayes, KNN, multilayer perceptron, EGBT	Demographics, Sx chars, PROMs, spinal pathology, PE	3; 24	VAS-NP 0.773–0.762
Zhang et al. []	CSM	50	SF-36, PCS	SVM	Demographics, PE, PROs, imaging chars	6	SF-36 PCS 86.4, MCS 89.8
Merali et al. []	DCM	757	SF-6D, mJOA	RF, SVM, LR, DT, ANN models	Demographics, spinal pathology, Sx chars, comorbidities, PROMs, PE	3–24	SF-6D and mJOA0.83–0.87
Khan et al. []	DCM	173	SF-36 MCS, SF-36 PCS	Classification trees, SVM, partial least squares, generalized boosted models, generalized additive models, MARS, RF, LR	Demographics, PE, comorbidities, Sx Hx, spinal pathology, mJOA	12	MCS 0.77, PCS 0.78
Hoffman et al. []	DCM	20	ODI	MLR, SVR	Demographics, spinal pathology, Sx chars, comorbidities, PROs, PE, fine motor function	6, 12, and 24	MAA of 0.0283 with SVR
Khan et al. []	DCM	702	mJOA	Boosted LR, SVM, naïve Bayes, generalized boosted machines, partial least squares, LR	Demographics, Sx chars, spinal pathology, PE	12	mJOA 0.834
Grob et al. []	Thoraco- lumbar	1115	ODI, BP (NRS), LP (NRS)	FUSE-ML, EN regularization	MRI, PROMs, demographics, ASA, PMHx, Sx Hx	12	ODI 0.70, BP 0.72, LP 0.70
Gupta et al. []	AIS	6076	SRS-Pain, SRS-Self-Image	LR, gradient boosting, EGBT	PROMs, demographics, spinal pathology, Sx chars	6; 12; 24	MAE 0.47–0.55
Pedersen et al. []	LDH	1968	ODI, VAS	DL, DT, RF, BT, SVM, LR, MARS	Demographics, PROs, employment details, comorbidities, self-reported expectations to return to work	24	EQ-5D 0.82, ODI 0.75, VAS LP 0.73, VAS BP 0.81, return to work 0.84
Ve et al. []	LDH	422	LP (NRS), BP (NRS), ODI	DL, LR	Demographics, ASA, PROMs, Sx chars, Sx Hx, spinal pathology, social Hx	12	BP 0.90, LP 0.87, ODI 0.84
Ames et al. []	ASD	561	Individual SRS-22R questions	EN, gradient boosting machines, EGBT, extreme gradient boosting linear, RF, EN regularized generalized linear models	Demographics, comorbidities, Sx chars, imaging chars, hospital chars, surgeon chars	12	SRS-22R questions 0.869 with EGBT
Karhade et al. []	LS	906	PROMIS-PF	Stochastic gradient boosting, RF, SVM, NN, EN penalized LR	Demographics, ASA, spinal pathology, Sx chars, PROMs, Rx opioids, geographic information	12	PROMIS-PF 0.75
Yagi et al. []	LS	848	VAS BP, VAS LP, JOABPEQ	Generalized LR, generalized linear mixed, LR, SVM, single-layer ANN, random trees, linear-AS, tree-AS, EGBT, chi-squared automatic interaction detection classification, regression tree	Demographics, Sx chars, PROMs	10	MAE 9.3−16.5
Siccoli et al. []	LS	635	BP (NRS), LP (NRS), ODI	RF, EGBT, BGLM, BT, KNN, simple BGLM, ANN with a single hidden layer	Clinical data, imaging chars, PROMs, demographics, ASA, Sx Hx, spinal pathology	6 weeks; 12 months	NRS-BP 0.79, 0.92
Khor et al. []	LS	1965	BP (NRS), LP (NRS), ODI	Binary LR	Demographics, clinical chars, ASA, Sx Hx, PROMs, comorbidities, Sx chars, Rx opioids, hospital chars	12	ODI 0.66, BP 0.79, and LP 0.69
Berjano et al. []	Lumbar	1243	ODI, SF-36, PCS, COMI Back	RF	Demographics, comorbidities, spinal pathology, PROMs, past Sx Hx	6	ODI 0.808
Finkelstein et al. []	Lumbar	122	NRS	LASSO regression	Clinical and demographic variables, PROMs, patient expectations and cognitive appraisal processes	10	NRS of 0.12 MBR2
Staartjes et al. []	Lumbar	1115	ODI, COMI, NRS	EN regularization	Demographics, Rx opioids, Sx Hx, Sx chars, PROMs	12	ODI and COMI 0.67
Halicka et al. []	Lumbar	4307	COMI-BP, COMI-LP	RF, LR	Demographics, Sx chars, hospitalization chars	3–24	COMI 0.63, BP 0.72, LP 0.68
Rigoard et al. []	Lumbar	200	PGIC	DRFA, PCA	ODI, EQ-5D, HADS, NRS	12	PGIC 0.853
Muller et al. []	Cervical and lumbar	10,002	COMI	LASSO, ridge regression	Demographics, Sx chars, surgeon chars, PROMs, psychological assessment	12	MAE back patients 2.1, neck patients 1.8

* = unless otherwise specified; # = number; AIS = adolescent idiopathic scoliosis; ANN = artificial neural network; ASA = American Society of Anesthesiologist; ASD = adult spinal deformity; AUROC = area under the receiver operating characteristic curve; BGLM = Bayesian generalized linear model; BP = back pain; BT = boosted tree; CSM = cervical spondylotic myelopathy; COMI = core outcome measure index; DRFA = dimensionality reduction factor analysis; DT = decision tree; EGBT = extreme gradient boosting tree; EN = elastic net; EQ-5D = EuroQol–5 dimensions; HADS = the hospital anxiety and depression scale; Hx = history; JOABPEQ = Japanese Orthopedic Association back pain evaluation questionnaire; KNN = k-nearest neighbors; LASSO = least absolute shrinkage and selection operator; LDH = lumbar disk herniation; LP = leg pain; LR = logistic regression; MAA = mean absolute accuracy; MAE = mean absolute error; MARS = multivariate adaptive regression spline; MBR2 = mean bootstrapped R2; MCS = Mental Component Summary; MCRI = multidimensional clinical response index; mJOA = modified Japanese Orthopaedic Association; MLR = multivariate linear regression; NDI = Neck Disability Index; NP = neck pain; NRS = numeric rating scale; PE = physical exam; PCA = principal component analysis; PCS = Physical Component Summary; PGIC = Patient Global Impression of Change; PMHx = past medical history; Pts = patients; PPT = postoperative prediction timeline; PROMIS-PF = Patient-Reported Outcomes Measurement Information System-Physical Function; RF = randomforest; Rx = prescription; SF-6D = short form-6 dimensions; SF-36 = short form-36 health survey; SRS = Scoliosis Research Society; SVR = support vector regression; SVM = support vector machine; Sx = surgical; VAS = Visual Analog Scale.

Table 2. Description of common patient-reported outcome measures.

Domain	PROM	Description
Multiple Outcomes	MCRI	Modified Clinical Response Index (MCRI) evaluates pain, functional capacity, quality of life, and outcomes in spinal surgery patients with Persistent Spinal Pain syndrome
	NASS	North American Spine Society (NASS) assesses outcomes and pain related to lumbar spine disease
	EQ-5D	EuroQol-5 Dimensions (EQ-5D) measures health status across five dimensions: mobility, self-care, usual activities, pain/discomfort, anxiety/depression
	COMI	Core Outcome Measures Index (COMI) measures the impact of back and leg pain, assessing pain, function, and quality of life
	SRS	Scoliosis Research Society (SRS) assesses function, pain, self-image, mental health, and satisfaction
Physical Function	NDI	Neck Disability Index (NDI) evaluates disability related to neck pain and its impact on daily activities
	JOA	Japanese Orthopaedic Association Score (JOA) assesses neurological function in patients with cervical myelopathy
	mJOA	Modified JOA (mJOA) evaluates functional impairment in cervical spine conditions
	ODI	Oswestry Disability Index (ODI) assesses disability due to lower back pain
	PROMIS- PF	Patient-Reported Outcomes Measurement Information System (PROMIS)-Physical Function (PF) assesses physical function and the ability to perform physical activities
	PCS	Physical Component Summary (PCS) is a subscore from SF36 measuring physical health
	DASH	Disabilities of the Arm, Shoulder, and Hand (DASH) measures upper-extremity function, pain, and work and social activity participation
Mental Health	MCS	Mental Component Summary (MCS) assesses psychological well-being
	MDI	Mental Disability Index (MDI) measures mental health-related disability
	PGIC	Patient Global Impression of Change (PGIC) measures a patient’s overall perception of improvement or change in condition
Quality of Life	SF-36	Short Form-36 Health Survey (SF-36) assesses overall health-related quality of life across multiple domains (physical, mental, and social)
Quality of Life	SF-6D	Short Form-6 Dimensions (SF-6D) is a condensed version of SF36 that measures a single index for health-related quality of life
Pain	VAS	Visual Analog Scale (VAS) measures intensity of pain using a 0–10 visual scale
Pain	NRS	Numeric rating scale (NRS) quantifies pain on a 0–10 scale
Social	JOABPEQ	Japanese Orthopaedic Association Back Pain Evaluation Questionnaire (JOABPEQ) evaluates the impact of back pain on physical and social functioning

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Scoping Review of Machine Learning and Patient-Reported Outcomes in Spine Surgery

Abstract

1. Introduction

2. Materials and Methods

3. Results

3.1. Search Results

3.2. Study Details

3.3. Key Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics