**The Relationship Between Gastrointestinal Comorbidities, Clinical Presentation and Surgical Outcome in Patients with DCM: Analysis of a Global Cohort**

**Aria Nouri 1,2, Jetan H. Badhiwala 3, So Kato 4, Hamed Reihani-Kermani 5, Kishan Patel 6, Je**ff**erson R. Wilson 3, Insa Janssen 1, Joseph S. Cheng 2, Karl Schaller 1, Enrico Tessitore <sup>1</sup> and Michael G. Fehlings 3,\***


Received: 31 December 2019; Accepted: 18 February 2020; Published: 26 February 2020

**Abstract:** Degenerative cervical myelopathy (DCM) is the most common cause of spinal cord impairment in adults, presenting most frequently in patients 50 years or older. Gastrointestinal comorbidities (GICs) commonly occur in this group; however, their relationship with DCM has not been thoroughly investigated. It is the objective of the present study to investigate the difference between patients with or without GICs who are surgically treated for DCM. A cohort of 757 patients with clinical data and 458 with magnetic resonance imaging (MRI) data from the AOSpine North America and AOSpine International studies on DCM was evaluated. GICs were obtained at presentation and included gastric, intestinal, hepatic, and pancreatic conditions. Patients were dichotomized into 2 groups: those with GICs and those without GICs. Both clinical and MRI presentation, as well as baseline neurological and functional status, were compared. Neurological and functional outcomes at 2-year follow-up were also compared. GICs were present in 121 patients (16%). These patients were less commonly male (48.76% vs. 65.4%, *p* = 0.001) and were slightly less neurologically impaired based on the Nurick grade (3.05 ± 1.10 vs. 3.28 ± 1.16, *p* = 0.044) but not based on mJOA (12.74 ± 2.62 vs. 12.48 ± 2.76, *p* = 0.33). They also had a worse physical health score (32.80 ± 8.79 vs. 34.65 ± 9.38 *p* = 0.049), worse neck disability (46.31 ± 20.04 vs. 38.23 ± 20.44, *p* < 0.001), a lower prevalence of upper motor neuron signs (hyperreflexia, 70.2% vs. 78.9%, *p* = 0.037; Babinski's sign 24.8% vs. 37.3%, *p* = 0.008), and a higher rate of psychiatric comorbidities (31.4% vs. 10.4%, *p* < 0.0001). On MRI, GIC patients less commonly exhibited signal intensity changes (T2 hyperintensity, 49.2% vs. 75.6%, *p* < 0.001; T1 hypointensity, 9.7% vs. 21.1%, *p* = 0.036), and had a lower number of T2 hyperintensity levels (0.82 ± 0.98 vs. 1.3 ± 1.11, *p* = 0.001). There was no difference in surgical outcome between the groups. DCM patients with GICs are more likely to be female and have significantly more general health impairment and neck disability. However, these patients have less clinical and MRI features typical of more severe neurological impairment. This constellation of symptoms is considerably different than those typically observed in DCM, and it is therefore plausible that nutritional factors may contribute to this unique observation.

**Keywords:** cervical spondylotic myelopathy (CSM); prospective; multicenter; anterior; posterior

### **1. Introduction**

Degenerative cervical myelopathy is the most common cause of spinal cord impairment in industrialized countries and can lead to significant neurological and functional dysfunction, as well as reduced quality of life [1]. The underlying pathology is heterogeneous and can include intervertebral disc disease, arthritic changes, hypertrophy and/or ossification of the spinal canal ligaments, and spondylolisthesis, ultimately leading to spinal cord injury through static and dynamic injury mechanisms [1]. Depending on the number of cervical levels involved, the degree of cord compression, and the natural history, patients present with a wide-ranging spectrum of clinical manifestations [2,3]. Symptoms include hyperreflexia, weakness, numbness, and loss of proprioception/balance, and clinical signs, such as Hoffmann's sign, Babinski reflex, Lhermitte's phenomenon, ankle clonus, inverted brachioradialis reflex, and Romberg's sign, which may be elicited on clinical examination [2,4,5]. Neurophysiological examination may indicate changes in motor and sensory evoked potentials; MRI signal intensity changes on T2 and T1 may highlight injury to the spinal cord. These various clinical factors and examinations have been used to assess degree of neurological impairment and surgical outcome. However, relatively little research has been undertaken to assess how comorbidities, such as gastrointestinal disease, impact baseline neurological status, and recovery potential in patients undergoing surgical treatment.

Gastrointestinal comorbidities (GICs) have the potential to influence the presentation and recovery of patients with myelopathy in a number of ways. For example, GICs can result in malnutrition (such as hypocupremia), anemia through blood loss, and vitamin B12 (B12) deficiency—all of which may impact spinal cord function or surgical recovery [6,7]. With regard to B12 deficiency, it has been recently suggested that vitamin B12 (B12) deficiency may be a common and under-recognized comorbidity in patients with DCM [8], and is also a differential diagnosis [9,10]. It has also been shown that anemia is related to higher surgical morbidity, worse neurological status at baseline and neurological outcomes, higher rates of medical complications, and raises the risk of complications by increasing the probability that a patient will require an allogeneic RBC infusion [11–13]. Other studies have shown that malnutrition increases 90-day major medical complications, 1-year mortality, and is a predictor of increased infection and wound dehiscence rates after lumbar spine surgery [14].

Given that identification of sequalae of GICs may have an important impact on the clinical management of DCM patients, it is the objective of the present paper to evaluate the influence of GICs on baseline neurological function and surgical outcomes for treatment of DCM.

### **2. Methods**

### *2.1. Study Data*

The combined AOSpine study cohort comprises 757 patients (AOSpine North American study, *n* = 278; AOSpine International Study, *n* = 479) [15,16]. The North American study was conducted between 2005 and 2007 and included 12 North American sites (11 USA, 1 Canada); the International study was conducted between 2007 and 2011 and included 16 global sites comprising 4 regions (North America, Latin America, Asia, and Europe). The primary study objective was to assess the safety and efficacy of surgical treatment for DCM and was previously reported [15,16]. Adult patients (≥18 years of age) were included if they had clinical signs and symptoms of myelopathy that were confirmed via imaging. Patients were excluded if they had an active infection, neoplastic disease, rheumatoid arthritis, ankylosing spondylitis, previous surgery, or concomitant signs of lumbar stenosis. Patient clinical data, general health (SF-36) [17], Neck Disability Index (NDI) [18], and neurological function (modified Japanese Association score [mJOA] [19] and Nurick grade [20] were assessed. The pain

subscore of NDI, which ranges from 0 to 5, was assessed to specifically evaluate pain. GICs were recorded non-specifically as present or absent and included potential gastric, hepatic, pancreatic, and intestinal comorbidities. Research ethics board approval was given at each participating center, and external monitors were used to visit the sites.

### *2.2. MRI Data*

MRI (1.5T or 3T) acquisitions were performed according to local protocols (no standardized protocols were used), and typically included axial and sagittal T2-weighted and sagittal T1-weighted images. DICOM (Digital Imaging and Communications in Medicine) and conventional image formats (JPEG, TIFF) were reviewed. DICOMs were reviewed using Osirix (www.osirix-viewer.com; Pixmeo, Geneva, Switzerland). MRIs were available for 458 patients, and the prevalence and spectrum of DCM pathology were previously published [21]. MRIs were assessed for the presence and absence of specific pathologies (e.g., isolated disc pathology, spondylolisthesis), for the presence of T2 signal hyperintensity, and T1 signal hypointensity changes. Signal intensity changes on T2 and T1 were reviewed by 3 raters, and the relationship between these changes and clinical presentation, as well as surgical outcome, were previously reported [2]. Therein, inter-rater reliability for signal changes was reported as being in substantial agreement for T2 hyperintensity (Fleiss Kappa: 0.60), and in fair agreement for T1 hypointensity (Fleiss Kappa: 0.31).

### *2.3. Statistical Analysis*

Statistical analysis was performed with SPSS (version 25.0, IBM, Armonk, NY, USA). Patients with DCM were separated into groups comprising those with or without GICs. Continuous variables are presented as means and were compared using independent t-tests. Categorical variables are presented as proportions and were assessed using Chi square. A last observation carry-forward approach was used to impute missing data for follow-up at 2 years. Measures of neurological and functional impairment between patients with and without GICs were compared at baseline and 2-year follow-up (mean difference from baseline) using independent t-tests. The baseline pain subscore of NDI was compared using an independent t-test. As a sensitivity analysis, between-group comparisons of change in mJOA, Nurick grade, NDI, and SF-36 physical component summary (PCS) and mental component summery (MCS) from baseline were made with the use of mixed-effects models for repeated measures. Fixed effects for the presence of GICs (GICs vs. no GICs), time (1 year, 2 year), and time x GIC interaction were included. Comparisons of least-squares means between groups at each time point were performed using the appropriate contrasts within the mixed-effect models.

### **3. Results**

There were 121 patients (16%) with GICs and 636 patients (84%) without GICs (Table 1). GIC patients were less commonly male (48.76% vs. 65.4%, *p* = 0.001) and were on average 2 years older than patients without GICs (57.98 ± 10.21 vs. 56.04 ± 12.10, *p* = 0.065); however, this did not reach statistical significance. Neurologically, GIC patients were marginally less impaired than patients without GICs (Nurick grade, 3.05 ± 1.10 vs. 3.28 ± 1.16, *p* = 0.044; mJOA, 12.74 ± 2.62 vs. 12.48 ± 2.76, *p* = 0.33) but had a higher rate of psychiatric comorbidities (31.4% vs. 10.4%, *p* < 0.0001). Patients with GICs also had worse physical disability (SF-36 PCS, 32.80 ± 8.79 vs. 34.65 ± 9.38, *p* = 0.049) and worse neck disability (NDI, 46.31 ± 20.04 vs. 38.23 ± 20.44, *p* < 0.001), but a lower prevalence of upper motor signs (hyperreflexia, 70.2% vs. 78.9%, *p* = 0.037; Babinski's sign 24.8% vs. 37.3%, *p* = 0.008). Duration of symptoms was similar for patients with and without GICs. The baseline NDI pain subscore was significantly worse in patients with GICs than those without (2.27 ± 1.32 vs. 1.75 ± 1.31, *p* < 0.001).


**Table 1.** Patient demographics and clinical and MRI presentation.

NDI, Neck Disability Index; MRI, magnetic resonance imaging; mJOA, modified Japanese Orthopaedic Association scale; PCS, physical component summary; MCS, mental component summary.

On MRI, patients with GICs less commonly exhibited signal intensity changes (T2 hyperintensity, 49.2% vs. 75.6%, *p* < 0.001; T1 hypointensity, 9.7% vs. 21.1%, *p* = 0.036) and had a lower number of T2 hyperintensity levels (0.82 ± 0.98 vs. 1.3 ± 1.11, *p* = 0.001) than patients without GICs. However, there were no differences in the number of compressed levels or the prevalence of combined anterior–posterior compression.

There were no differences in neurological or functional outcomes at 2-year follow-up between patients with or without GICs (Tables 2 and 3).


**Table 2.** Surgical outcome at 2-years follow-up.

**Table 3.** Outcomes comparing GIC and non-GIC subgroups using linear mixed effects modeling.


\* Values are reported as least-squares means of change in outcome scores from baseline at each follow-up time point. Baseline scores are reported as mean values for each treatment group. † Values are reported as difference in means (95% CI). Confidence intervals have not been adjusted for multiplicity.

### **4. Discussion**

Gastrointestinal conditions are common in the elderly, and therefore, it is not surprising that GICs were prevalent in 16% of patients with DCM. What is clear from this study is that patients with GICs represent a unique cohort that is quite different from the typical DCM patient: (1) they are more commonly female, despite the fact that prevalence of DCM among males is greater among reported studies (1), (2) almost a third of patients have psychiatric comorbidities, a much higher prevalence than otherwise expected, (3) patients had a large discrepancy between their general health measure score and NDI vs. neurological measures, showing significantly increased general health and neck disability but milder neurological impairment, and (4) GIC patients showed significantly lower MRI evidence of

cord injury, despite having only subtle differences in neurological function. Given the large sample of patients with GICs (*n* = 121) in the cohort and the substantial deviance of clinical presentation in multiple dimensions (i.e., demographic, general health, neck pain, and objective findings of neurologic injury), it is clear that the presence of GICs is influential, but due to its broad categorization, it is challenging to account for specific factors.

Generally, GICs can result in a number of potential conditions, including anemia (due to blood loss), as well as malnutrition and vitamin deficiencies (due to GI resections or inflammatory conditions) that are essential to spinal cord function [6,7,22]. Anemia is usually easily identified through preoperative screening, and its preoperative presence should be managed prior to surgery to avoid complications. Indeed, it has been previously shown using the NSQIP database that preoperative anemia is an independent risk factor for complications, the need for perioperative blood transfusion, return to the operating room, and extended length of stay after cervical surgery [12,23]. From a different perspective, it has also been proposed through animal studies that spinal cord compression may result in irregular nervous stimulation of the stomach, a phenomenon termed neck-stomach syndrome [24]. However, this connection remains largely unexplored.

The potential role of nutritional or vitamin deficiencies in DCM has not been adequately investigated, and therefore, it is unclear how these patients would present. In general, it is known that lack of nutritional factors is contributory to the health of intervertebral discs, as the avascular nature of discs and reliance on diffusion renders them susceptible to injury due to undernutrition. In particular, nutritional levels must exceed a critical threshold for the cells to remain viable and active [25].

Two nutritional factors that may be specifically relevant to neurological function include B12 deficiency and copper deficiency (hypocupremia). Deficiency of either of these can result in both myelopathy and anemia [6,7]. Copper deficiency is rare, typically manifests due to high zinc intake, gastric resection, and malabsorption, and in a majority of cases treatment does not reverse myelopathic injury [6]. In contrast, B12 deficiency is much more common: It has been estimated that subclinical or clinical deficiency exists in up to 20% of elderly patients [26]. Further, clinical manifestation of B12 deficiency can mimic DCM and present with T2 cord hyperintensity. B12 deficiency has been reported to occur concomitantly with DCM [27–29], as well as in patients with suspected DCM—but underlying SACD—who experienced a resolution of symptoms after B12 administration [9,10,30]. B12 deficiency is most commonly due to pernicious anemia, bowel resection, inflammatory bowel disorders, liver disease, or gastric atrophy [31,32]. Unfortunately, without lab work to corroborate this, this remains speculative. However, B12 deficiency is also known to cause cognitive impairment and neuropsychiatric disease [33] and could be responsible for the high level of psychiatric comorbidities observed amongst patients with GICs in the present analysis. The relationship between psychiatric and gastrointestinal comorbidities, as well as other somatic symptoms such as back pain, has been previously reported [34,35]. For example, a recent study on irritable bowel syndrome concluded that psychiatric factors could contribute to predisposition, precipitation, and perpetuation of IBS symptoms [36]. Such findings suggest a potential explanation for the significantly different levels of neck disability between the two groups, as it is plausible that a higher rate of psychiatric comorbidities contributed to the higher rate of non-objectifiable symptoms. It also suggests that perhaps the high level of psychiatric symptoms is the reason for this different population clinical phenotype.

Overall, the findings suggest that patients with GICs were less commonly severely neurologically impaired. This is evidenced by the lower prevalence of objective upper motor signs (Babinski's reflex, hyperreflexia) and MRI evidence typical of more severely impaired patients (T1 hypointensity). Despite these and a marginally lower Nurick grade, there was no statistically significant difference in surgical outcomes between patients with or without GICs.

### *Limitations*

A clear limitation to this study is the nonspecific nature of having classified patients into a single group of gastrointestinal comorbidities. It would have been preferable to know specific diagnoses; however, these data were not available. Furthermore, given that the main study was not focused on gastrointestinal disease, we may have not captured an accurate population prevalence. Due to this, caution needs to be taken in interpreting the results, as false positive relationships are possible. Further, we have hypothesized that the unique differences observed here are possibly due to nutritional deficiencies; however, further work is needed to corroborate this. Lastly, because MRI data were derived from multiple global sites, there was no standardized protocol used to obtain MRIs.

### **5. Conclusions**

Patients with GICs represent a unique cohort that is different from typical DCM patients: (1) they are more commonly female, (2) almost a third of patients have psychiatric comorbidities, and (3) they have worse general health and NDI findings, but less severe neurological deficits and MRI evidence of neurological impairment. This constellation of symptoms is considerably different than those typically observed in DCM; it is therefore plausible that nutritional factors that frequently manifest in elderly patients may contribute to this unique observation.

**Author Contributions:** Conceptualization, A.N.; Methodology, A.N. and J.H.B.; Validation, A.N. and J.H.B.; Formal Analysis, A.N., J.H.B., S.K. and H.R.-K.; Investigation, A.N. and M.G.F.; Resources, A.N., J.S.C., E.T., K.S. and M.G.F.; Data Curation, A.N. and J.H.B.; Writing-Original Draft Preparation, A.N., J.H.B. and K.P.; Writing-Review & Editing, All authors; Supervision, J.S.C., E.T., K.S. and M.G.F.; Project Administration, A.N., J.H.B. and K.P.; Funding Acquisition, A.N. and M.G.F. All authors have read and agreed to the published version of the manuscript.

**Acknowledgments:** The authors would like to acknowledge AOSpine for funding the initial AOSpine CSM-NA and CSM-I studies. MGF would like to acknowledge support from the Halbert Chair in Neural Repair and Regeneration and the DeZwirek Family Foundation.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Validating the Transformation of PROMIS-GH to EQ-5D in Adult Spine Patients**

**Shreyas Panchagnula 1,**†**, Xin Sun 1,**†**, Julio D. Montejo 1,2, Aria Nouri 1,3,4, Luis Kolb 1, Justin Virojanapa 1,5, Joaquin Q. Camara-Quintana 1, Samuel Sommaruga 1,4, Kishan Patel 1, Nikita Lakomkin 6, Khalid Abbed <sup>1</sup> and Joseph S. Cheng 1,3,\***


Received: 1 August 2019; Accepted: 12 September 2019; Published: 20 September 2019

**Abstract:** Spinal disorders and associated interventions are costly in the United States, putting them in the limelight of economic analyses. The Patient-Reported Outcomes Measurement Information System Global Health Survey (PROMIS-GHS) requires mapping to other surveys for economic investigation. Previous studies have proposed transformations of PROMIS-GHS to EuroQol 5-Dimension (EQ-5D) health index scores. These models require validation in adult spine patients. In our study, PROMIS-GHS and EQ-5D were randomly administered to 121 adult spine patients. The actual health index scores were calculated from the EQ-5D instrument and estimated scores were calculated from the PROMIS-GHS responses with six models. Goodness-of-fit for each model was determined using the coefficient of determination (*R*2), mean squared error (MSE), and mean absolute error (MAE). Among the models, the model treating the eight PROMIS-GHS items as categorical variables (CATReg) was the optimal model with the highest *R*<sup>2</sup> (0.59) and lowest MSE (0.02) and MAE (0.11) in our spine sample population. Subgroup analysis showed good predictions of the mean EQ-5D by gender, age groups, education levels, etc. The transformation from PROMIS-GHS to EQ-5D had a high accuracy of mean estimate on a group level, but not at the individual level.

**Keywords:** EQ-5D; PROMIS; spine; transformation; quality of life; patient outcomes; validation

### **1. Introduction**

High costs associated with surgical treatment of spine disorders demand a larger role for cost-utility analyses of treatment options. Amidst socioeconomic limitations and finite resources, spinal disorders occur at a high frequency, incur high costs for the healthcare system, and are treated with a heterogeneity of interventions. According to the 2010 Global Burden of Disease Study, low back pain had the greatest number of years lost to disability out of 291 conditions studied [1,2] and the annual direct costs of care provided for patients with spine disorders has been estimated at \$90 billion [3]. Low back pain in particular presents a unique challenge, as there are numerous treatment modalities available whose comparative efficacy and value have not been fully substantiated [2].

Measuring the value of an intervention necessitates the use of a health utility score that encapsulates the health status, or patient-perceived overall health, at any given moment. Health status measures (HSMs) generally fall into two categories: (1) profile-based measures, such as the Patient-Reported

Outcomes Measurement Information System (PROMIS) [4]; and (2) preference-based measures, such as EuroQol 5-Dimension (EQ-5D) [5]. Profile-based measures characterize health status by assigning a score to each of multiple domains of health. Preference-based measures characterize health status by providing a single utility score from multiple domains of health. The utility score, based on valuations of different health states, is central to estimation of quality-adjusted life years (QALY), cost-utility analysis, cost-effectiveness of interventions, and quantitation of health outcomes [2,6].

Many health status measures have been designed for generic or disease-specific use [7,8]. In 1990, EuroQol developed the EQ-5D three-level survey (EQ-5D-3L, abbreviated as EQ-5D below), a preference-based HSM with two parts: (1) a descriptive survey with five questions assessing five dimensions of health; and (2) a visual analog scale that permits a numeric self-assessment of general health [5]. Responses to the descriptive survey yields a health utility index score.

In 2007, the National Institute of Health (NIH) developed PROMIS Global Health Survey (PROMIS-GHS), a standardized, self-reported profile-based HSM with 10 self-reported global health items that summarize general perceptions of health [4]. This survey is freely available for public use and is increasingly adopted in clinical settings. However, economic analyses have been classically performed using other preference-based measures, including EQ-5D.

With an increased desire to determine the value of health care and increase in HSMs, there is a growing interest to correlate different HSMs. In 2009, Revicki et al. facilitated a conversion from PROMIS-GHS to EQ-5D index scores using generic United States (US) population data [9]. Since then, many clinical studies have used this model (REVReg) when evaluating health outcomes of surgical and medical interventions [10–12]. While effective, such a conversion faces challenges and requires validation for specific patient populations or diseases. Furthermore, design of the model itself and its parameters can be optimized.

For instance, in 2017, Thompson et al. proposed new models to optimize REVReg using linear and equipercentile equating [13]. Linear and equipercentile equating are linking techniques that, after predicting scores, assign profile-based responses to preference-based scores by aligning score distributions of the two scales. Using Revicki et al.'s original data set, they recreated Revicki et al.'s regression model (REVReg), applied linear equating to REVReg (REVLE), and applied equipercentile equating to REVReg (REVequip). In a similar fashion, they created three models by treating the score as categorical variables (CATReg, CATLE, CATequip) for a total of six models. They performed external validation of these models on a neurologic disease cohort from Cleveland Clinic.

In this study, we compared these six models in a cohort of adult spine patients to assess their ability to map PROMIS-GHS to EQ-5D in the spinal population.

### **2. Experimental Section**

### *2.1. Surveys*

A short demographics form was used to obtain gender, age, race/ethnicity, education, medical history, and spine diagnosis of participants.

The PROMIS Global Health survey includes ten global health items to assess overall health: (1) general health, (2) quality of life, (3) physical health, (4) mental health, (5) social satisfaction, (6) physical activities, (7) pain, (8) fatigue, (9) social activities, and (10) emotional distress. Every item except the pain item is rated on a numeric five-level scale (1 representing poor and 5 representing excellent); the pain item is scored from 0 to 10, where 0 indicates no pain and 10 indicates the worst imaginable pain. The pain item is then recoded to a five-level scale, and the fatigue and emotional problem item is recoded such that a high score represents better health status. Individual global item scores from completed PROMIS surveys were used to calculate estimates of EQ-5D index scores.

The EQ-5D is a preference-based instrument designed to measure generic health status across five dimensions of health: (1) mobility, (2) self-care, (3) usual activities, (4) pain/discomfort, and (5) anxiety/depression, with three response levels (no problems, some problems, extreme problems) [14]. A unique EQ-5D health state is defined by combining one level from each of the five dimensions, and each health state corresponds to a health index ranging from −0.109 to 1.0, with greater scores correlating to better overall health [15]. This index was calculated for every completed EQ-5D survey according to the valuations developed by Shaw et al. and derived from a large scale survey of the US general population [15]. The single visual analogue scale component of EQ-5D (EQ-5D VAS) was obtained but not evaluated in this study. Permission to use EQ-5D was granted by the EuroQol Group.

### *2.2. Study Design and Participants*

This study was primarily conducted in the adult spine clinics of the three neurosurgeons (K.A., J.S.C, and L.K) at Yale University School of Medicine in New Haven, CT, with Institutional Review Board approval. Figure 1 illustrates the design of the study. In these clinics, 146 adult (>18 years of age) spine patients were recruited in 2017 as they entered the clinic with voluntary consent regardless of their clinical status (pre-operative, post-operative, or non-operative). Three forms were administered in paper to these patients: a demographics short form, PROMIS-GHS, and EQ-5D. PROMIS-GHS and EQ-5D were administered in random order. Completion of these two survey components was essential for obtaining an EQ-5D index and corresponding index estimates from PROMIS Global Health items. Out of 146 patients, complete survey responses were obtained from 121 patients.

**Figure 1.** Sample Selection. PROMIS-GHS: Patient-Reported Outcomes Measurement Information System Global Health Survey; 5Q-5D: EuroQol 5-Dimension; 5Q-5D-3L; EQ-5D three-level survey.

### *2.3. Models Tested in the Study*

**REVReg:** This model was developed in 2009 by applying ordinary least squares (OLS) regression on the PROMIS Wave 1 Sample (i.e., the sample used by Revicki et al.) [13,16] to predict EQ-5D index scores from PROMIS-GHS items. This model uses eight out of 10 PROMIS-GHS items in its algorithm (excluding responses to general health and social satisfaction) and treats these items as continuous variables.

**REVLE**: This model is the result of applying linear equating, a method of linking, to REVReg. While regression models aim to predict preference-based scores from profile-based responses, linking models align score distributions of observed and predicted scores to establish a scale that provides an equivalent preference-based score for each set of profile-based responses. Linear equating is applied to REVReg with the following equation:

$$\mathcal{Y}\_{LE} = \ \mu\_Y + \frac{\sigma\_Y}{\sigma\_{Y\_R}} \left( \mathcal{Y}\_R - \mu\_{Y\_R} \right) \tag{1}$$

where *YLE* is the estimated value from linear equating, μ*<sup>Y</sup>* and σ*<sup>Y</sup>* are the mean and standard deviation of the observed EQ-5D scores from the PROMIS Wave 1 Sample, respectively, and μ*YR* and σ*YR* are the mean and standard deviation of the predicted EQ-5D scores from REVReg, respectively.

**REVequip**: This model was developed by applying equipercentile equating to REVReg. Equipercentile equating is a linking method that matches the cumulative distribution functions of observed scores and predicted scores from REVReg using smoothing functions or nonparametric techniques.

**CATReg**: This model was implemented in 2017 by Thompsons et al. Like REVReg, this model utilizes OLS regression on the PROMIS Wave 1 sample to predict EQ-5D index scores from eight PROMIS-GHS items. Unlike REVReg, CATReg treats these items as categorical variables.

**CATLE**: This model is the result of applying linear equating to CATReg.

**CATequip**: This model was developed by applying equipercentile equating to CATReg.

### *2.4. Statistical Analysis*

Statistical analyses were conducted in R Studio [17]. Responses to each of the 121 completed EQ-5D surveys were utilized to calculate an EQ-5D index score according to the valuations developed by Shaw et al. [15]. Estimates of the EQ-5D index scores from PROMIS Global Health Item responses were obtained by applying the six models developed by Revicki et al. and Thompson et al. (REVReg, REVLE, REVequip, CATReg, CATLE, CATequip) [9,13].

The goodness of fit for each model in our sample of patients was measured with the Pearson correlation coefficient (*r*), coefficient of Determination (*R*2), mean squared error (MSE), and mean absolute error (MAE). Correlation r measures the strength of the linear relationship. Higher absolute values indicate stronger linear correlations. *R*<sup>2</sup> demonstrates how much variance could be explained by the regression model. The mean squared error (MSE) and mean absolute error (MAE) were measured to examine the scale of difference between each estimate and observed value. Models with lower MSE or MAE have better predictions.

In addition, comparisons of actual EQ-5D scores and optimal estimates were performed by subgroups, such as gender, age groups, ethnicity, education, and spine diagnosis. According to Luo et al., 0.04 was recommended as the minimal clinically important difference of a EQ-5D utility score with a scale from −0.109 to 1 [18]. If the mean difference is less than 0.04, we consider it is an accurate estimate of the mean.

However, good linear correlation does not always imply good agreement. In order to evaluate the transformation on an individual level, the Bland–Altman assessment of agreement was conducted. It could visually show the difference between actual and estimated scores of each patient. Histograms of the observed EQ-5D scores and estimates from each model were also plotted to show distributions of scores.

### **3. Results**

### *3.1. Demographic Characteristics*

Table 1 contains the demographics of the experimental cohort of adult spine patients. Our cohort of 121 patients had an average age of 59 years, was 59% female, and had a majority with Caucasian race/ethnicity. Highest level of education in these patients ranged from less than high school (4%) to advanced college degree (17%), with 33% completing high school, 31% having some college or associate's degree, and 14% having a bachelor's degree. Patients had a variety of conditions in their medical histories, including cancer, lung disease, psychiatric illness, heart disease, rheumatologic disease, central nervous system (CNS) disorders, and liver/kidney disease.


**Table 1.** Demographic and clinical characteristics of survey participants.

The cohort of this study had demographics comparable to the sample of the generic US population studied by Revicki et al. [9] and the neurologic disease cohort studied by Thompson et al. [13]. Unlike Revicki et al. and Thompson et al., however, all sample subjects had spine diagnoses, including cervical and lumbar stenosis (most common), deformity, myelopathy, radiculopathy, spondylolisthesis, fracture, tumor, and pseudoarthrosis. The specificity of spine diagnosis distinguishes the cohort of this study from the general cohort of Revicki's study.

### *3.2. Statistical Analysis*

Table 2 presents the metrics used to assess the models applied to our sample. The estimated score in the CATReg model (0.60) was closest to the observed EQ-5D index scores (0.62). The mean difference was 0.012 (95% CI, –0.012–0.036, *p* = 0.3144), which indicated no significant difference between actual EQ-5D score and CATReg estimates. All other estimates were significantly different using the paired t-test. The *R*<sup>2</sup> values for all six models ranged between 0.54 and 0.59. Pearson correlation coefficients were all above 0.7, showing strong linear correlation. Of the six models, CATReg had the highest *R*<sup>2</sup> (0.59) and lowest MSE (0.02) and MAE (0.11). Thus, CATReg is the optimal model among them.


**Table 2.** Mean (standard deviation (SD)) of actual and estimated EQ-5D Index Scores, *R*<sup>2</sup> values, correlation coefficients, mean squared errors (MSE), and mean absolute errors (MAE) for models in the spine patient sample (*N* = 121).

In order to investigate the accuracy of CATreg model predictions, subgroup analysis was also performed, shown in Table 3. Within most subgroups, the mean difference was less than 0.04 (the minimal clinically important difference of EQ-5D score), which means the EQ-5D score could be accurately predicted using PROMIS-GHS. For example, the female spine patients' observed EQ-5D score was 0.62 and the estimate of CATreg was 0.60 (95% CI, 0.56–0.64), while the males' was 0.60 vs. 0.60 (95% CI, 0.55–0.66). Caucasian Americans had a higher average EQ-5D score (actual 0.64 vs. estimates 0.64) than other ethnicities (actual 0.52 vs. estimates 0.50). The actual score for different education level ranged from 0.53 to 0.70. Generally, the larger the group size, the better prediction was achieved. All the subgroups with more than 17 patients had a mean difference less than 0.04, which indicates this score transformation should be more appropriately used on a group level, instead of individual level.



In order to investigate the prediction performance at an individual level, Bland-Altman analysis was conducted. Figure 2 demonstrated the mean residual was 0.01, with 95% limits of agreement between actual and CATreg estimated EQ-5D scores ranged from −0.25 to 0.27. It revealed that for a single patient, the variation from their actual score is huge and largely exceeded the minimal clinically important difference 0.04.

**Figure 2.** Bland-Altman agreement plot. *X* axis is the average score of the actual and estimated EQ-5D score of the CATreg model. *Y* axis is the difference between the two. Each dot represents a patient. The three dashed lines are upper 95% limits of agreements (mean + 1.96 SD), mean difference, and lower 95% limits of agreements (mean − 1.96 SD).

Figure 3 depicts histograms of the observed EQ-5D-3L scores and estimates from REVReg, REVLE, and REVequip in our sample. Figure 4 depicts histograms of the observed EQ-5D scores and estimates from CATReg, CATLE, and CATequip. In both figures, the histograms of regression estimates and linear equating estimates resemble a normal distribution, while the histograms of the observed 3L scores and equipercentile equating estimates have a bimodal distribution. These histograms confirmed that the estimates from the transformation models are not a good match on an individual level.

**Figure 3.** Histograms of observed EQ-5D index scores and estimates from REVReg, REVLE, and REVequip.

**Figure 4.** Histograms of observed EQ-5D index scores and estimates from CATReg, CATLE, and CATequip.

### **4. Discussion**

### *4.1. Validation and Technical Aspects*

Our study assessed and compared six models that were developed in a generic sample to map PROMIS-GHS to EQ-5D in a specific sample of patients with spinal disorders. In our sample of patients with spinal disease, all six models achieved an *R*<sup>2</sup> greater than 0.5. According to Brazier et al., models that map to preference-based scores commonly achieve an *R*<sup>2</sup> of greater than 0.5 within the sample of model development [19]. *R*<sup>2</sup> as a measure of goodness-of-fit can determine how well the model explains the dataset it was estimated on. However, it did not show the scale of difference. In that regard, MSE and MAE can better assess mapping functions by indicating size of prediction errors [19]. So, we compared the models with consideration of all the goodness-of-fit indicators.

First, we agreed that treating PROMIS-GHS item scores 1 to 5 as categorical variables (CATReg) performed better than treating them as continues variables (REVReg), with closer mean estimate (0.60 vs. 0.57, actual score = 0.62), higher *R*<sup>2</sup> (0.59 vs. 0.57), and lower MAE (0.11 vs. 0.13). However, unlike the recommendation of using equating technics used in the Thompson et al. article, in our spine sample population, the linear and equipercentile equating models (REVLE, and REVequip, CATLE, and CATequip) did not work well compared to the CATReg. Thus, although all six models demonstrated adequate prediction ability, the CATReg model is the optimal one for patients with spine disease.

Second, we recommend using this transformation from PROMIS-GHS to EQ-5D utility score on group-level mean estimates, not for individual prediction. From the subgroup analysis, it showed the accurate prediction (mean difference less than 0.04) was achieved in groups with more than 17 patients. To be more conservative, sample sizes of at least 30 patients are suggested for the good mean estimate of a EQ-5D score from PROMIS-GHS using the CATReg model.

### *4.2. Utility of Health Care Measurement*

HSMs have often been validated in patients with spinal disease before clinical application. For instance, Guilfoyle et al. validated the Medical Outcomes Study Short Form (SF-6, -12, -36), a general health outcome measure, in patients with lumbar disc prolapse, lumbar canal stenosis,

and degenerative cervical myeloradiculopathy. This study found strong correlation between SF surveys and disease-specific measures such as the Roland Morris Disability Score (RMDS), Myelopathy Disability Index (MDI), and Hospital Anxiety and Depression Scales (HADS) [20,21]. Similarly, EQ-5D was assessed for its validity for use in spine surgery by comparison with the Oswestry Disability Index (ODI) in a study of patients who underwent lumbar spine surgery for degenerative disorders [21,22]. According to the study, EQ-5D and ODI were equal in assessment of health state, thus validating the use of EQ-5D in patients with spinal disorders.

The validation of EQ-5D and other HSM questionnaires in patients with spinal disorders paved the way for assessing the value of spinal interventions using health utility index scores. For instance, Witiw. et al. assessed the lifetime incremental cost-utility of surgical treatment for degenerative cervical myelopathy in a prospective observational cohort study by calculating health utility and QALYs from SF-6D [23]. Tosteson et al. used data from the Spine Patient Outcomes Research Trial (SPORT) to determine that lumbar discectomy was a clinically beneficial and cost-effective treatment of intervertebral disc herniation [24]. They also determined that spinal stenosis surgery was cost-effective but degenerative spondylolisthesis surgery was not cost-effective over a period of two years [25]. Conclusions from Tosteson et al. were based on the use of the EQ-5D index to obtain measures of QALY and incremental cost-effectiveness ratio.

Cost-utility studies of spinal interventions have also used estimation models to obtain health utility scores from other surveys. Qureshi et al. investigated the cost-effectiveness of anterior cervical discectomy and fusion (ACDF) and cervical disc replacement (CDR) as therapies for single-level cervical degenerative disc disease (DDD) [26]. To do this, the group used results of the 36-Item Short Form Health Survey (SF-36) from the ProDisc-C investigational device exemption study along with a model generated to estimate preference-based index scores from the Short Form-6 dimensions (SF-6D) (derived from a subsection of SF-36 items) [27].

Mapping PROMIS to EQ-5D can prove to be a powerful method of calculating health utility in economic cost-benefit studies. Along with its increased use by the NIH, PROMIS and its domain item banks allow flexibility in administration using either targeted short forms or computerized adaptive tests [4,9]. The importance of validating models such as those developed by Revicki et al. [9] and Thompson et al. [13] lies in assessing the clinical and economic utility of applying generic models to disease-specific populations, including those with spinal pathologies.

### *4.3. Clinical Implications*

The findings in the paper indicate that PROMIS can act as a reasonable surrogate for EQ-5D. For hospitals or medical centers that have already collected PROMIS-GHS and do not have EQ-5D, they could use this transformation to estimate EQ-5D scores and then calculate the quality adjusted life-year for cost-effectiveness analysis. Based on previous reports and our data, it appears that CATReg is the choice with the lowest error for patients with spinal disorders.

Measurement of health status not only assesses general cost-effectiveness of interventions but also provides the opportunity to assess individual patients longitudinally. Consequently, one can assess changes in conservative management, and treatment modalities can be altered in accordance to health status. Regular clinical usage of HSMs develops a general repository of health outcomes data that would otherwise come solely from research studies, potentially alleviating substantial costs for prospective research studies.

### *4.4. Limitations*

One of the limitations of this study was the sample size of the cohort. Though the cohort in this study had a variety of spine pathologies, our sample size was limited to clinics in a single institution. The results may not represent the whole spine population. Second, we tried to create our own prediction model. However, only three out of 10 PROMIS-GHS items (general health, social satisfaction, pain) were significant predictors due to the limited sample size.

### **5. Conclusions**

This study assesses and compares six models that map PROMIS-GHS to EQ-5D index values in a population of patients with spinal disorders. All six models demonstrate adequate and comparable predictive performance in our sample, thus validating their economic utility. Among the six models, the CATreg model is recommended for spine patients. That is, EQ-5D utility scores could be most accurately estimated by the linear combination of eight significantly correlated items from PROMIS-GHS, while scores 1 to 5 for each item is treated as a categorical variable. In addition, we suggest using this transformation model for group-based estimates, instead of for individual patient's EQ-5D score estimates. Validation studies of HSMs can lead to their application in cost-utility analyses.

**Author Contributions:** Conceptualization, S.P., X.S., J.D.M., A.N., L.K., J.V., J.Q.C.-Q., S.S., N.L. and J.S.C.; methodology, S.P., X.S., J.D.M., A.N., L.K., J.V., J.Q.C.-Q., N.L., and J.S.C.; software, S.P., X.S. and J.D.M.; formal analysis, S.P., X.S. and A.N.; investigation, S.P., X.S., J.D.M., A.N., L.K., J.V., S.S., K.A., and J.S.C.; data curation, S.P., X.S. and A.N.; writing—original draft preparation, S.P.; writing—review and editing, X.S., J.D.M., A.N., K.P. and J.S.C.; visualization, S.P., and X.S.; supervision, J.S.C. and A.N.; project administration, X.S., A.N., J.S.C.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Journal of Clinical Medicine* Editorial Office E-mail: jcm@mdpi.com www.mdpi.com/journal/jcm

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18