1. Introduction
Systemic sclerosis (SSc), also known as scleroderma, is an immune-mediated rheumatic disease characterized by excessive collagen deposition in the skin and other systems, including the musculoskeletal, cardiopulmonary, renal, and gastrointestinal tract. Women are predominantly affected, with a female-to-male ratio of 3–8:1, specially in limited cutaneous SSc (lcSSc) form, while the diffuse type affects males and females at more comparable rates. The peak of incidence for the first symptoms usually arises between 45 and 64 years old [
1,
2].
Heterogeneity in the clinical presentation of SSc patients may induce different outcomes in the disease course. In some cases, disease may be stabilized across several months, while in other patients, particularly in the diffuse cutaneous SSC (dcSSc) subtype, the disease can have a fulminant clinical course. Treatment options are prescribed according to disease manifestations, with immunosuppressive (IS) drugs being the standard of care [
3].
Due to multiorgan involvement, this pathology is associated with a significant impairment of health-related quality of life (HRQoL) and a high morbimortality [
4,
5]. The evaluation of the severity and activity caused by this disease usually requires several clinical examinations. Self-administered patient-reported outcomes (PROs) in health status questionnaires arise as a practical, inexpensive, reliable, and valid method to assess functional repercussions related to some rheumatic diseases and their impact on patients’ perspectives.
Initially, the Health Assessment Questionnaire Disability Index (HAQ-DI) was developed for rheumatoid arthritis [
6,
7], but it has been proven to be a valuable tool to predict and assess outcomes in SSc disease. So, in 1991, Poole and Steen [
8,
9] added five specific visual analog scales (VASs) to address overall disease severity, Raynaud’s phenomenon, digital tip ulcers, and gastrointestinal and lung symptoms, creating a more disease-specific measure for SSc, the Scleroderma HAQ (SHAQ). The SHAQ is an accurate and feasible multisystem-specific tool to measure disease status changes that has been widely used in SSc [
9]. This questionnaire was translated and validated in several languages, such as Brazilian Portuguese [
10,
11], Chinese [
12], French [
13], Italian [
14], Japanese [
15], Spanish [
16], Swedish [
17], and Turkish [
18], but not yet to European Portuguese. Thus, the purpose of this study was to create and validate the SHAQ for Portuguese patients with SSc.
2. Materials and Methods
We followed the good practice principles to culturally adapt health outcome instruments to other linguistic contexts [
19] and the Cosmin taxonomy [
20]. This means that we tested the reliability (internal consistency and reproducibility) and the validity (content, construct, and criterion) of the obtained Portuguese version.
2.1. Cultural Adaptation and Content Validity
Before initiating this study, we contacted MAPI Research Institute to obtain permission to validate the Portuguese version of SHAQ. We received the information that a Portuguese non-validated version already existed, following the Food and Drug Administration (FDA) guidance on translation [
21], and that we should use the version located on the MAPI website (
https://eprovide.mapi-trust.org; accessed on 6 August 2021).
This MAPI version was the result of the implementation of the forward–backward translation process; however, we still felt the necessity to perform a clinical review with two rheumatologists and a cognitive debriefing with patients to validate the content of this Portuguese version.
For the clinical review, a document was sent to the rheumatologists in which, for each item of the questionnaire, the English and Portuguese versions were placed, and the experts were asked to give us one of the following answers: (i) the translation is correct; (ii) the translation is wrong and an alternative is suggested; or (iii) if the translation is not incorrect, a proposed alternative would be better. Regarding the cognitive debriefing, two panels of five patients each were created. These panels approximately respected the age–sex joint distribution of patients with this pathology, prioritizing the lowest possible literacy. In each panel, patients were given the questionnaire to fill out, after which they were asked about the existence of missing, repeated, or ambiguous questions.
2.2. Participants
To validate the SHAQ, we then invited consecutive patients from five Portuguese Hospital Centers to participate in this study, between January and April 2022. Included patients fulfilled the 2013 American College of Rheumatology (ACR)/European League Against Rheumatism (EULAR) criteria for the classification of diffuse or limited SSc [
22], combined with the EUSTAR (European Scleroderma Trial and Research Group) criteria for very early diagnosis of SSc (VEDOSS) [
23], or presented SSc sine scleroderma. These patients were supposed to be autonomous, aged between 18 and 80 years, and have the ability to understand Portuguese and to grant informed consent. Pregnant women were excluded.
To test the reliability of the SHAQ, a smaller group of patients was also randomly selected to complete the measurement instrument a second time one month after the previous consultation.
The study was approved by the Ethics Committee of the Regional Health Authority of the Centre (ARSC 14/2020) and by the Ethics Committee from one of the main hospital centers (CHTV 05/16/09/2021). We also obtained authorization from all heads of the five rheumatology departments involved. Each participant signed a written consent form before filling out the questionnaire.
2.3. Measurement Instruments
Included patients were asked to complete sociodemographic, lifestyle, and clinical information; the Portuguese versions of the generic questionnaires to measure health status (SF-36v2) and quality of life (EQ-5D-5L); the University of California Los Angeles Scleroderma Clinical Trial Consortium Gastrointestinal Tract Instrument (UCLA GIT 2.0); and the SHAQ.
Regarding the sociodemographic variables, we collected data from sex, age, marital and employment status, and years of education. Also, smoking and alcoholism were proxies for lifestyle variables. Lastly, the clinical variables measured were the SSc subset classification, disease duration since diagnosis, 2013 ACR/EULAR classification criteria, and organ involvement. Immunosuppression was defined as exposure to at least one of the following: mycophenolate mofetil (MMF), cyclophosphamide (CYC), methotrexate (MTX), azathioprine (AZA), leflunomide, glucocorticoids (>10 mg/d prednisone-equivalent), rituximab, tocilizumab, and abatacept, for more than 6 months.
The Short Form Health Survey (SF-36v2) is a generic instrument to measure the perception general population individuals have regarding their health status, on a scale from 0 (death) to 100 (perfect health status) [
24]. It assesses eight dimensions (physical functioning—PF, bodily pain—BP, role limitations due to physical health—RP, general health perception—GH, mental health—MH, role limitations due to emotional problems—RE, vitality—VT, and social functioning—SF) and provides two component summary measures, a physical (PCS) and a mental one (MCS). In the case of the Portuguese version [
25], these summary measures are normalized to the Portuguese general population.
The EuroQoL EQ-5D-5L is a generic preference-based quality of life questionnaire that measures five dimensions (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression) [
26]. Each dimension has five levels of intensity. A visual analog scale EQ-VAS also asks for self-perception of general health status. Portuguese utilities can be computed by an algorithm based on general public preferences [
27] and Portuguese norms are also available [
28].
The UCLA-GIT 2.0 questionnaire is a recognized and reliable instrument to assess gastrointestinal (GI) symptoms in SSc patients and its impact on quality of life [
29]. Recently validated to the Portuguese population [
30], it measures eight HRQoL dimensions (reflux, distention/bloating, fecal soilage, diarrhea, social functioning, emotional wellbeing, and constipation) and has been used in several clinical trials of GI treatments in patients with SSc as an outcome measure [
31,
32]. The total UCLA GIT score is calculated by averaging all the subscales, except the one for constipation, and ranges from 0 (best HRQoL) to 2.83 (worst HRQoL). The levels of GI severity symptoms used in this paper were described by the author [
33].
The SHAQ is comprised of the HAQ-DI plus six additional VASs—pain, intestinal, breathing, Raynaud’s, finger ulcer, and overall disease severity. The HAQ-DI contains 20 items and measures eight domains: (i) dressing and grooming, (ii) arising, (iii) eating, (iv) walking, (v) hygiene, (vi) reach, (vii) grip, and (viii) activities [
34]. The answers for each question use a response scale from (0) without any difficulty to (3) unable to do. Intermediate response options are (1) with some difficulty and (2) with much difficulty. The highest score for any component question of each domain determines the score for that domain, with an exception for the necessity of aids or devices, where the score is automatically raised to two. A composite score is calculated by the average of the eight domains and ranges from 0 to 3, with a lower score indicating less impairment in function.
Each additional VAS has a 1-week recall period, and it is represented by a line with a length of 10 cm. The value of the SHAQ VAS is multiplied by 0.3 to obtain the final score, ranging from 0 to 3 representing a minimum to maximum limitation, respectively.
2.4. Reliability
We tested the reliability of the Portuguese SHAQ version through internal consistency and intertemporal test–retest stability.
Internal consistency was tested through the score of Cronbach’s alpha coefficient, where accepted values should be between 0.70 and 0.90 [
21]. The intertemporal stability was tested by the intraclass correlation coefficient (ICC) with two consecutive moments one month apart. We used the two-way mixed effects model and we looked at the absolute differences between the ratings of patients. We also followed the criteria that defend that an ICC lower than 0.50 corresponds to a weak correlation, between 0.50 and 0.75 and between 0.75 and 0.90 to a moderate and good one, respectively, and a score higher than 0.90 corresponds to an excellent correlation [
35].
2.5. Validity
Construct validity tests included both structural validity and hypothesis testing with samples of sociodemographic and clinical variables. Criterion validity was then tested by comparing the HAQ-DI and SHAQ VAS scores with the scores obtained by the SF-36v2 and EQ-5D-5L [
20].
To test structural validity, we conducted an exploratory factor analysis based on principal component estimates with a previous assessment of the sampling adequacy via the Kaiser–Meyer–Olkin (KMO) indicator and Bartlett’s test of sphericity. A KMO smaller than 0.50 or between 0.50 and 0.60 is considered unacceptable or poor, and scores between 0.60 and 0.70, between 0.70 and 0.80, between 0.80 and 0.90, or higher than 0.90 are seen as fair, average, good, or very good, respectively [
36]. The significance of the Bartlett sphericity test should be smaller than 0.001 [
37].
The hypothesis testing was performed with known sociodemographic (sex, age group) and clinical variable groups, based on the distribution of each HAQ-DI variable. Student’s t-test was used for two independent variables and ANOVA was used for more than two independent variables.
To assess the criterion validity, we computed Pearson’s correlations between SHAQ items, SF-36v2 physical summary measures, and EQ-5D-5L index scores. We followed Cohen’s [
38] rule from which correlations smaller than 0.30 are considered weak, between 0.30 and 0.50 are moderate, and higher than 0.50 are considered strong.
The statistical software used was SPSS v.28 (IBM, Armonk, NY, USA).
3. Results
3.1. Content Validity
Content validity was tested through a clinical review involving two rheumatologists and a cognitive debriefing conducted with two panels of five patients each. Both procedures did not result in any changes to the Portuguese version. That is, the Portuguese version of the SHAQ was accepted in clinical terms and accepted by patients without significant comment.
3.2. Sample
This study’s sample was composed of 102 SSc patients who immediately agreed to participate, a little bit higher than 100, the smallest size proposed by the Cosmin taxonomy [
20].
Table 1 presents the sociodemographic and lifestyle behavior distributions, as well as the main clinical characteristics.
Our sample is mainly formed of females (82.4%) and patients older than 50 years (70.6%). The majority were married (69.6%) and had at least seven years of education (58.9%). In what concerns lifestyle behaviors, a small percentage were alcohol drinkers (15.7%) and an even smaller percentage were smokers (5.9%).
We also evidence that 62.7% were patients with limited SSc and 29.4% had a diffuse form. More than half of the patients (56.9%) had less than five years of disease duration, the Raynaud’s phenomenon was present in 94.1%, and abnormal nailfold capillary was present in 79.4% of the cases. In what concerns skin manifestations, sclerodactyly and telangiectasia were also prevalent at 63.7% and 62.7%, respectively. The autoantibodies profile showed a large majority of patients with positivity to ANA (89.2%), followed by anti-centromere (59.0%) and anti-topoisomerase I (19.6%).
Forty patients were under IS therapy, with MTX being the most frequent drug in 21.8%, followed by MMF in 11.9% and AZA in 5.9%. Glucocorticoid therapy was performed in only four patients.
Table 2 presents the generic health status and quality of life scores, as well as the UCLA GIT 2.0 domains and SHAQ scores.
It is evident that there are lower scores for the physical health status dimensions when compared to mental dimensions. In fact, the highest scores occur on the SF-36 dimensions of ’social role functioning’ (70.1) and ‘emotional role functioning’ (64.3). The overall quality of life was a little bit higher than average (66.8).
On the other hand, the total UCLA score presented mild severity with a mean of 0.39 ± 0.45, being the most frequently affected domains distension/bloating (mean 0.69 ± 0.78), constipation (mean 0.47 ± 0.64), and reflux (mean 0.46 ± 0.53).
Concerning the SHAQ questionnaire, the worst visual analog scales in a descending order were overall disease severity (mean 35.2 ± 25.4), pain (mean 31.7 ± 22.8), and Raynaud’s (mean 26.2 ± 28.8), with a mean of 0.58 ± 0.51 for the HAQ-DI score.
3.3. Reliability
Thirty-one patients answered to the retest. The internal consistency of HAQ-DI was highly reliable (Cronbach’s α = 0.866) and ICC scores are presented in
Table 3.
The lowest moderate ICC score was computed for the SHAQ Raynaud’s VAS. All the others can be classified as good. The ICC corresponding to the HAQ-DI can even be considered excellent.
3.4. Construct Validity
Before testing the structural validity through the factor analysis, we assessed the suitability of data for factor analysis. The KMO value was 0.798 higher than the recommended value of 0.6 and Bartlett’s test of sphericity was also associated with a significance <0.001. From factor analysis, and using the Kaiser criterion of eigenvalues higher than 1.0, we only obtained one factor corresponding to an eigenvalue of 3.073, evidencing the unidimensionality of all six SHAQ VASs. Looking at the corresponding component matrix, the VAS item with the highest loading was ‘overall disease severity’ (0.850), followed by ‘pain’ (0.766), ‘Raynaud’ (0.730), ‘gastrointestinal tract’ (0.719), ‘lung involvement’ (0.623), and ‘digital ulcers’ (0.571). Despite this unidimentionality, the five VASs will be analyzed separately.
To test construct validity, we compared the behavior of HAQ-DI with levels of selected sociodemographic and clinical variables (
Table 4).
3.5. Criterion Validity
To test criterion validity, we correlated HAQ-DI and SHAQ-VAS scores with selected dimensions of UCLA GIT 2.0, SF-36v2, and EQ-5D-5L (
Table 5).
This table evidences that HAQ-DI is mainly correlated with the SF-36v2’s physical summary measure, as well as with generic quality of life scores. In what concerns the more specific SHAQ VAS indicators, the physical summary measure is also highly correlated with the SHAQ overall disease severity VAS, and the SHAQ pain VAS was correlated with both the SF-36 Bodily Pain and EQ-5D-5L Pain/Discomfort dimensions. At last, the SHAQ intestinal VAS was highly correlated with UCLA total score and the Distension/bloating dimension.
Looking at clinical indicators, we investigated whether the scores obtained by the SHAQ VAS could be considered as determinants of relevant clinical variables (
Table 6).
4. Discussion
Impairment in physical and psychosocial quality of life in patients with multisystem involvement associated with SSc is a major concern in clinical practice. Although, there is a lack of tools to evaluate disease activity, predict outcomes, and measure changes during the course of the disease. Self-administered questionnaires represent a standardized tool to evaluate consequences in daily life activities and assess patients’ perspectives of disease severity.
HAQ-DI internal consistency was shown to be highly reliable (α = 0.866). We also observed good test–retest reliability scores of the HAQ-DI and five of the SHAQ VASs (pain, intestinal, breathing, finger ulcers, and overall disease severity). The SHAQ Raynaud’s phenomenon VAS had an ICC less than the recommended 0.7, probably due to the large instability of this phenomenon, which can be dependent on external factors like climate and stress. A similar result has been found by other authors [
12].
On the other hand, the most severe SSc subtype is represented by the diffuse cutaneous form with a large proportion of patients with multiorgan involvement and consequently a greater disability in daily living tasks and worse scores in the HAQ-DI [
8]. Similar to other studies, and in what concerns the construct validity, we found higher-significance differences in HAQ-DI in patients with dcSSc [
8,
13,
15]. In resemblance to some authors in previous validations, we did not find differences regarding disease duration since diagnosis [
10,
11,
13,
17]. This may be explained by most patients with a disease duration less than five years. Male sex and older patients (more than 65 years old) presented significant differences in HAQ-DI with worse scores. In previous studies, no differences were reported regarding gender. On the other hand, the Japanese also reported worse HAQ-DI scores in older patients [
15].
Criterion validity was mainly evidenced through the correlation between the HAQ-DI and SF-36v2 physical summary measure (r = −0.688) and EQ-5D-5L index score (r = −0.723). Likewise, the SHAQ overall disease severity VAS was also correlated with the SF-36v2 physical summary measure (r = 0.628). Because SHAQ is a disability measure, mental score correlations were smaller [
10,
11,
12,
13,
14,
17,
18]. With the exception of the Raynaud’s VAS, all the other VASs correlated well with similar clinical variables.
Although we have complied with the minimum sample size to validate a measurement instrument, we consider that it would be advantageous to replicate the study with a larger sample. Probably, this would provide us with better variability of the analyzed variables.
In particular, a larger sample size would increase the probability of recruiting participants with a longer duration of the disease. In this study, we collected consecutive patients from five Portuguese Hospital Centers to participate in this study, and almost 57% of these patients presented with less than 5 years of disease duration. Longer duration of disease might be associated with more complications and impaired quality of life; although, we believe that this does not affect the process of validation.
On the other hand, we know that the diffuse form of SSc is less prevalent in the population, occurring in about one-third to one-fourth of patients with systemic sclerosis. A larger sample might also include more patients with the diffuse subtype, but this difference in sample prevalence will still remain.