Next Article in Journal
Sulodexide Inhibits Arterial Contraction via the Endothelium-Dependent Nitric Oxide Pathway
Previous Article in Journal
Genome-Wide Association Screens for Anterior Cruciate Ligament Tears
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Psychometric Evaluation of the Dysphagia Handicap Index Using Rasch Analysis

by
Reinie Cordier
1,2,3,
Annette Veronica Joosten
4,
Bas J. Heijnen
5 and
Renée Speyer
2,5,6,7,*
1
Department of Social Work, Education and Community Wellbeing, Northumbria University, Newcastle upon Tyne NE1 8ST, UK
2
Curtin School of Allied Health, Curtin University, Perth, WA 6102, Australia
3
Department of Health & Rehabilitation Sciences, Faculty of Health Sciences, University of Cape Town, Cape Town 7700, South Africa
4
School of Allied Health, Australian Catholic University, Melbourne, VIC 3065, Australia
5
Department of Otorhinolaryngology and Head and Neck Surgery, Leiden University Medical Centre, 2333 ZA Leiden, The Netherlands
6
Department Special Needs Education, University of Oslo, NO-0371 Oslo, Norway
7
MILO Foundation, Centre for Augmentative and Alternative Communication, 5482 JH Schijndel, The Netherlands
*
Author to whom correspondence should be addressed.
J. Clin. Med. 2024, 13(8), 2331; https://doi.org/10.3390/jcm13082331
Submission received: 29 February 2024 / Revised: 6 April 2024 / Accepted: 10 April 2024 / Published: 17 April 2024
(This article belongs to the Section Otolaryngology)

Abstract

:
Background/Objectives: The Dysphagia Handicap Index (DHI) is commonly used in oropharyngeal dysphagia (OD) research as a self-report measure of functional health status and health-related quality of life. The DHI was developed and validated using classic test theory. The aim of this study was to use item response theory (Rasch analysis) to evaluate the psychometric properties of the DHI. Methods: Prospective, consecutive patient data were collected at dysphagia or otorhinolaryngology clinics. The sample included 256 adults (53.1% male; mean age 65.2) at risk of OD. The measure’s response scale, person and item fit characteristics, differential item functioning, and dimensionality were evaluated. Results: The rating scale was ordered but showed a potential gap in the rating category labels for the overall measure. The overall person (0.91) and item (0.97) reliability was excellent. The overall measure reliably separated persons into at least three distinct groups (person separation index = 3.23) based on swallowing abilities, but the subscales showed inadequate separation. All infit mean squares were in the acceptable range except for the underfitting for item 22 (F). More misfitting was evident in the Z-Standard statistics. Differential item functioning results indicated good performance at an item level for the overall measure; however, contrary to expectation, an OD diagnosis presented only with marginal DIF. The dimensionality of the DHI showed two dimensions in contrast to the three dimensions suggested by the original authors. Conclusions: The DHI failed to reproduce the original three subscales. Caution is needed using the DHI subscales; only the DHI total score should be used. A redevelopment of the DHI is needed; however, given the complexities involved in addressing these issues, the development of a new measure that ensures good content validity may be preferred.

1. Introduction

Oropharyngeal dysphagia (OD) or deglutition disorders are associated with dehydration, malnutrition, aspiration pneumonia, and even death [1,2,3,4]. Apart from affecting physical well-being, dysphagia has a major impact on a person’s quality of life [5,6]. Therefore, self-report measures are important for dysphagia assessment [7,8].
Patient self-evaluation comprises two different aspects: functional health status (FHS) and health-related quality of life (HR-QoL) [4,8]. FHS is the impact of a given disease on the ability to perform tasks in multiple domains (including physical, social, role, and psychological functioning). The FHS aims to quantify the symptomatic severity and (loss of) function due to the disease and/or treatment and the effects on daily life as experienced by the individual at a particular point in time [7]. HR-QoL refers to the unique personal perception of someone’s health, considering social, functional, and psychological issues [9]. Although considered two distinct concepts, self-evaluation questionnaires frequently combine both FHS and HR-QoL, without distinguishing disease-related functioning from disease-related quality of life as experienced by the patient [10].
A measure’s robust psychometric properties must be demonstrated before implementing it in healthcare or research [11,12]. Two systematic reviews summarising the evidence for the measurement properties of patient self-reported measures developed for people with dysphagia reported poor and incomplete psychometric data [13,14]. Nearly all studies apply the principles of Classic Testing Theory (CTT) when evaluating the psychometric robustness of measures, while only a few studies use the more contemporary item response theory (IRT) framework [15,16,17]. CTT and IRT are the most common frameworks used in instrument development and the evaluation of psychometric properties [18]. CTT analyses evaluate the performance of a measure as a whole, whereas the IRT framework uses the item as the unit of analysis. Also, contrary to CTT, results in IRT are not bound by the test population [11,19]. Consequently, psychometric studies using CTT analyses may yield different results than studies incorporating IRT principles and, therefore, may lead to different recommendations or guidelines about which measures to implement in clinics or research [10,15,20].
Commonly used self-report measures for patients with dysphagia include the MD Anderson Dysphagia Inventory (MDADI; [21]), the Swallowing Quality of Life Questionnaire (SWAL-QOL; [22]), the Eating Assessment Tool (EAT-10; [23]) and the Dysphagia Handicap Index (DHI; [24]). To date, only two self-reported measures targeting people with oropharyngeal dysphagia have been evaluated using IRT analyses: the SWAL-QOL [15] and the EAT-10 [10,20,25]. In contrast to previous studies reporting on both measures’ good validity and reliability using CTT, more recent studies using IRT analyses identified major psychometric weaknesses in both measures, calling for further evaluation of the underlying structure and possible redevelopment using IRT.
Overall, both CTT and IRT principles should be considered when developing new instruments and evaluating the psychometric properties of existing measures. Repeating limited CTT analyses for a single measure (repeated cross-cultural validation of a measure into numerous languages; for example, [26]) may not further strengthen the psychometric evidence to support its use, while introducing the IRT framework alongside CTT principles will lead to a better understanding of the robustness of the psychometric properties of a measure, prioritising the quality over quantity of psychometric analyses.
Originally, the Dysphagia Handicap Index (DHI) was developed and validated by Silbergleit, Schultz [24] using CTT. The DHI is a patient-administered questionnaire comprising 25 items across three subscales: the emotional (7 items), functional (9 items), and physical aspects of individuals’ lives (9 items) [24]. Items are scored using a three-point ordinal scale (i.e., never = 0; sometimes = 2; or always = 4), with higher scores indicating a higher degree of disability or impact on patients’ quality of life. The questionnaire concludes with a single question on the patients’ perceived severity of dysphagia using a seven-point scale with three anchor values (1 = normal swallowing; 4 = moderate swallowing problem; and 7 = severe swallowing problem). The item descriptions are provided in Table 1.
After its publication, several psychometric studies have been conducted on the DHI to determine its psychometric properties using CTT analyses; none have used IRT principles. For example, many studies evaluated hypothesis testing (e.g., convergent validity; [27,28]) and cross-cultural validity (e.g., [29,30]), while minimal data on responsiveness can be obtained from the literature, and no data on measurement error and structural validity have been published.
To address the gap in research, this study aimed to apply an IRT approach to determine the psychometric robustness of the DHI. Using the Rasch measurement model, this study evaluated the response scale, the person and item fit characteristics, differential item functioning, and the scale’s dimensionality.
Table 1. Dysphagia Handicap Index items and domains.
Table 1. Dysphagia Handicap Index items and domains.
Item #DomainItem Description
11PI cough when I drink liquids.
22PI cough when I eat solid food.
33PMy mouth is dry.
44PI need to drink fluids to wash food down.
55PI’ve lost weight because of my swallowing problem.
61FI avoid some foods because of my swallowing problem.
72FI have changed the way I swallow to make it easier to eat.
81EI am embarrassed to eat in public.
93FIt takes me longer to eat a meal than it used to.
104FI eat smaller meals more often due to my swallowing problem.
116PI have to swallow again before food will go down.
122EI feel depressed because I cannot eat what I want.
133EI do not enjoy eating as much as I used to.
145FI do not socialise as much due to my swallowing problem.
156FI avoid eating because of my swallowing problem.
167FI eat less because of my swallowing problem.
174EI am nervous because of my swallowing problem.
185EI feel handicapped because of my swallowing problem.
196EI get angry at myself because of my swallowing problem.
207PI choke when I take my medication.
217EI am afraid that I will choke and stop breathing because of my swallowing problem.
228FI must eat another way (e.g., feeding tube) because of my swallowing problem.
239FI’ve changed my diet due to my swallowing problem.
248PI feel a strangling sensation when I swallow.
259PI cough up food after I swallow.
Notes. Item description from Silbergleit, Schultz [28]; Blue = physical items; green = functional items; pink = emotional items.

2. Methods

2.1. Participants and Procedure

Prospective, consecutive patient data were collected from January 2017 to February 2018 at clinics for dysphagia or otorhinolaryngology at the Leiden University Medical Center, the Netherlands. Only adult patients (i.e., 18 years and older) at risk of dysphagia and who underwent either a videofluoroscopic swallowing study (VFSS) or a fiberoptic evaluation of swallowing (FEES) were included in this study. Patients with severe cognitive problems or esophageal dysphagia were excluded.
All patients completed the DHI independently, after which a VFSS or FEES was performed as part of standard clinical care. The diagnosis of OD was confirmed through a visuoperceptual evaluation of VFSS or FEES recordings by an experienced speech and language pathologist and/or laryngologist. Further, patient characteristics were collected on both age and gender, in addition to oral intake data (i.e., Fois Oral Intake Scale [FOIS]) [31] as completed by the speech and language pathologist.
In line with COSMIN criteria for adequate sample size for psychometric studies [32], the sample size needed to be five times the number of items, with a minimum sample size of 100. This study was approved by the local Medical Ethics Committee Leiden (approval code: G16.100; date: 17 January 2017) at the Leiden University Medical Center.

2.2. Instrument

In 2012, a prototype patient self-report DHI was developed based on a composite series of 60 complaints from dysphagia patients over a one-month period [24]. Twenty-one items were eliminated (i.e., item total correlations r < 0.50 [n = 21] or redundancy/similar wording [n = 14]). Four items with low item total correlations were included in the final DHI as they were considered by the authors to have high content validity or provide pertinent clinical information. The final DHI version was subsequently reduced to 25 items across three subscales: an emotional (7 items), a functional (9 items), and a physical subscale (9 items). The authors chose three response levels to facilitate patients’ understanding of response requirements and added a final item on dysphagia severity as perceived by the patient.

2.3. Statistical Analysis

Rasch analyses were employed to evaluate the reliability and validity of the DHI. Winsteps version 3.92.0 [33] was used to analyse the data, using the joint maximum likelihood estimation rating scale estimation [34]. The initial steps were to analyse all 25 DHI items. An iterative process was then used to remove poor-fitting items in various combinations and re-run the analysis to obtain the best overall item fit, person separation, and dimensionality statistics. All investigations included the analyses as described below. Figure 1 provides a schematic representation of all the Rasch domains that were evaluated.

2.4. Rating Scale Validity

To confirm whether the ordinal response scale for all items stays true to the assumption that higher ratings indicate “more” and lower ratings indicate “less” within the DHI measure, a Rating Scale Model (RSM) was used to examine the rating scale validity. The three situations in which the partial credit model in Winsteps can be used [34] do not apply to the DHI scale structure, and all DHI items have the same scale structure. To align with the DHI response options, the original categories (i.e., never = 0; sometimes = 2; or always = 4) were recoded as Never (0), Sometimes (1), and Always (2) to comply with Rasch requirements for an ordinal scale [19].
Category response data were examined for an even distribution or category disorder to determine if the rating response scales were being used in an expected manner. Non-uniformity or category disordering may occur when poorly designed items that do not measure the construct are included. Average measure scores that increase monotonically as the category increases indicate ordered categories. Misfitting categories and disordering, indicated by mean squares (MnSq) outside 0.7–1.4, can be considered for collapsing into an adjacent category [19].
To assess step disordering, Andrich thresholds were used to estimate the equal probability of response in either of the two adjacent categories. Andrich thresholds measure the distance between categories, and it is expected that such distance progresses monotonically, without overlap or with too large a gap. Where step disordering is identified, the category may define a narrow section of the variable, but step disordering does not imply that the category definitions are out of sequence. On a 5-category scale, an increase of at least 1.0 logit indicates distinct categories within the measure. An increase of >5.0 logits indicates gaps in the variable [35].

2.5. Person and Item Fit Statistics

Fit statistics, reported as log odd units (logits), were used to assess construct validity. Patterns of responses for each person and misfitting items were analysed to determine the reliability of an individual’s responses. Logits also indicate whether the items contribute to the main construct (i.e., swallowing difficulty). Infit and outfit are both described as unstandardised MnSq or Z-Standard (Z-STD) statistics. Infit and outfit MnSqs should be close to 1.0 with an acceptable range of 0.7–1.4 [36]. Infit and outfit Z-STD statistics should be close to 0 with an acceptable range of ±2 [36]. Where underfitting is found, further investigation is required to understand the reason. Though underfitting degrades the model, the same is not always true of overfitting; however, caution must still be used to avoid misinterpreting that the model has worked better than expected [36].
Person reliability, the IRT equivalent to Cronbach’s alpha, is used to evaluate the internal consistency of the measure. Low values (<0.8) suggest that the measure has too few items or reduced variability in responses (i.e., there are few people with responses in the high or low ranges, indicating more extreme abilities).
To distinguish high performers (in swallowing) from low performers (in swallowing), person separation determines whether the test separates the sample into sufficient levels. When identified as accidental responses, outliers are managed using person separation. For clusters that represent true performances, people are classified using the person separation index (PSI)/strata (4* person separation +1/3). When person separation is low, it can be assumed that the measure is not sensitive enough to separate low and high performers. Reliability values of 0.5, 0.8, and 0.9, respectively, indicate separation into only one or two levels, 2–3 levels, and 3–4 levels [19]. To consistently identify three performance levels, a PSI/strata of 3 is required (the minimum level to attain a reliability of 0.9). An item hierarchy with <3 levels (high, medium, low) is verified using the item reliability. If item reliability is <0.9, then the sample is too small to confirm the measure’s construct validity (item difficulty).

2.6. Differential Item Analysis

A differential item functioning (DIF) analysis was performed to examine whether the scale items were used in the same way by all groups. DIF occurs when a characteristic other than the swallowing difficulty being assessed influences the rating of an item [36]. The DIF analysis was performed on all 25 items. We tested DIF in variables where we expected DIF (e.g., OD vs. no OD) and in variables where we did not expect DIF (e.g., sex). The sample was categorised by age (18–39 years vs. 40–59 years vs. 60–69 years vs. 70–79 years vs. >80 years), participant category (OD vs. no OD), sex (male vs. female), diagnostic category (neurological disorders vs. head and neck oncology vs. other disorders), and swallowing difficulty according to FOIS (nothing by mouth vs. tube dependent with minimal attempts of food or liquid vs. tube dependent with consistent oral intake of food or liquid vs. total oral diet of a single consistency vs. total oral diet with multiple consistencies, requiring special preparation or compensations vs. total oral diet with multiple consistencies without special preparation, but with specific food limitation vs. total oral diet with no restrictions).
These were variables of interest based on the current literature about OD. In addition, given that the DHI is a measure of swallowing difficulties, we needed to establish if it could detect differences in performance for those with and without swallowing difficulties, as we would expect this would impact their DHI scores [24]. Patients with neurological disorders (e.g., stroke, acquired brain injury, Parkinson’s disease, multiple sclerosis, cerebral palsy, or Alzheimer’s disease; [37]), head and neck cancer [38], and other disorders (e.g., structural deficits of the oral cavity, pharynx, or larynx; [39]) have been found to have poorer swallowing outcomes.
A significant DIF on a large number of items can indicate item bias. DIF based on age would be expected for older patients [40]. In terms of sex, previous research found that men and women experience similar rates of swallowing difficulty; as such, we do not expect DIF [41]. Swallowing difficulty as classified using the FOIS is expected to show DIF for those with more severe swallowing difficulty [42], as well as DIF for those diagnosed with OD using VFSS or FEES, compared to those without OD [5].
Differential item functioning contrast refers to the difference in difficulty of the item between both groups. Concerning the hypothesis ‘this item has the same difficulty for two groups’, DIF is noticeable when the DIF contrast (the reporting of the effect size in Winsteps) is at least 0.5 logits with a p-value < 0.05. The combination of DIF contrast (of at least 0.5 logits) and the p-value (<0.05) needs to be present, as statistical significance can be affected by sample size, and the sample size may not be large enough to exclude the possibility of being accidental [19]. Inspection results of the direction of the logits in the DIF contrast scores indicate the difficulty of the item in comparison to what was expected (i.e., positive logits indicate that the item was more difficult than expected [lower scores] and negative logits indicate that the item was easier (higher scores)). In determining DIF when comparing more than two groups (i.e., age, diagnoses, FOIS levels, and DHI severity) with the hypothesis ‘this item has no overall DIF across all groups’, the chi-square statistic and p-value < 0.05 are used [19]. There are two DIF methods used within Winsteps. The Mantel method is used for polytomous data, which are complete or almost complete. The Mantel–Haenszel method is used for uniform DIF analysis of complete or incomplete dichotomous data; for incomplete or sparse data, it uses a logistic uniform DIF method to estimate the difference between the Rasch item difficulties for the two groups, holding everything else constant. To overcome the limitation of incomplete data, Mantel/Mantel–Haenszel in Winsteps are (log-)odds estimators of the DIF size and significance based on the cross-tabulation of the observations of the two groups and use theta to stratify the data. Mantel and Mantel–Haenszel do not require a large sample [43], so they are suitable for our sample size. Winsteps also employs a non-uniform DIF logistic technique and a graphical non-uniform DIF approach. We used the Mantel and Mantel–Haenszel tests as they are considered the most authoritative for DIF analyses of dichotomous and polytomous variables [33].

2.7. Dimensionality of the Scale

There are a number of ways to assess dimensionality, including (a) using negative point-biserial correlations to identify problematic items; (b) using Rasch fit indicators to identify misfitting items or persons; and (c) employing Rasch factor analysis using principal component analysis (PCA) on the standardised residuals [44]. A PCA of residuals checks the number of principal components to confirm that there are no second or further dimensions after the intended or Rasch dimension is removed. Where residuals for pairs of items are uncorrelated and normally distributed, it can be assumed that no second dimension is present. To determine if further dimensions in the residuals are present, the following criteria are recommended: (a) the Rasch factor uses a cut-off >60% of the explained variance; (b) on first contrast, an eigenvalue of <3 (equivalent to three items) is used; and (c) a first contrast of <10% of the explained variance is used [19].
Distributions of a person’s abilities and item difficulties are represented using the person–item dimensionality map, using a logit scale framework. For this paper, person ability refers to a person’s self-rated ability to swallow. Items on the DHI that are rated with such infrequency, because very few people with swallowing problems will give these items a high rating, will be classified as “difficult” items. In contrast, “easy” items might refer to aspects of swallowing that occur regularly and will receive high self-ratings. Where two or more items represent similar difficulty, they will be placed in the same location on the logit scale. Gaps in the item difficulty continuum are identified when persons are represented with no corresponding item. Another indication of the overall distribution is the person measure score, using a mean measure score of 50 to determine the location on the person item map. A centralised item mean score of lower than 50 implies that people in the sample were more able than the level of difficulties in the items; higher than 50 indicates a lower ability than the mean item difficulty.

3. Results

The sample of 256 records from people at risk of OD was used for Rasch analyses, thus meeting the COSMIN criteria of an adequate sample size (more than five times the number of items [5 × 25 = 125], and a minimum sample of 100) [32]; 53.1% were male, and 46.9% were female, with an overall mean age of 65.2 years (SD 14.2; range 18–96 years). A total of 188 patients with confirmed OD and 68 patients without OD were included. About one-third of patients were diagnosed with neurological disorders, one-third with head and neck oncology, and the remaining patients reported dysphagia due to other medical causes (e.g., dysphagia after surgery or presbyphagia). Oral intake data (FOIS) and dysphagia severity (DHI) data showed a wide spread of swallowing ability. No data were missing except for the DHI severity scale (missing data: 13/256; 5.1%). The participants’ demographic information is reported in Table 2.

3.1. Rating Scale Validity

The Dysphagia Handicap Index (DHI) is a 25-item measure of three domains of quality of life (QoL) related to the physical aspects of dysphagia (9 items), functional aspects (9 items), and emotional aspects (7 items). The respondents rate the extent to which each statement applies to them with scores of (0) for never, (1) for sometimes, and (2) for always. This results in a maximum score of 50, with higher scores indicating poorer swallowing ability. Respondents also rate their perception of the severity of their swallowing difficulty on a scale from 1 (normal) to 7 (severe problems). We first examined the instrument overall, followed by individual analyses of the three subscales, and finally, we completed analyses to test the removal of items to determine if this improved the fit to the model.
We first examined the response category, item and person fit, dimensionality, and DIF for the DHI, and then for each of the subscales, physical, functional, and emotional aspects, and then finally examined the effect of removing each of the most misfitting items in the overall scale.

Category Order

The examination of the response category for the overall instrument revealed that as the category order increased (from 0 to 2), all fit statistics were in the acceptable range (Z-STD = 0.7–1.4), with the average measure scores increasing monotonically, indicating three distinct, ordered categories (see Table 3 and Figure 2). The Andrich thresholds reflect the relative frequency of use of the categories, and these were not disordered, but the step difficulty in the categories advanced by >5 logits between categories 1 and 2 (+4.69) (4.69 − (−4.69) = 9.38 logits), indicating a potential gap in the measure of the variable (i.e., in the rating category labels).
We then examined the category order for each of the three subscales. Average measures for the physical and functional subscales increased monotonically, and the examination of the Andrich thresholds revealed they were not disordered but increased by <5 logits between categories 0 and 1 on the functional subscale (−4.80), but by >5 logits on the physical subscale (−7.54). The step difficulty increased by >5 logits between categories 1 and 2 on the functional subscales (+4.80) (4.80 − (−4.80) = 9.60 logits) and on the physical subscale (+7.54) (7.54 − (−7.54) = 15.08 logits). For the emotional subscale, the average measure did not increase monotonically, and the Andrich thresholds were ordered but increased by >5 logits between 0 and 1 (−8.23) and between 1 and 2 (+8.23) (8.23 − (−8.23) = 16.46 logits). The examination of the category fit statistics revealed no categories in the misfit range.

3.2. Person and Item Fit

The summary item and person ability infit and outfit statistics for the 25-item scale were examined (see Table 4). There was a good item reliability estimate (0.97) of items with a separation of 5.76, and person reliability was 0.91. The person separation index (PSI) was 3.23, indicating that persons were reliably separated into at least three distinct groups based on the strata of abilities. When examining the subscales’ item reliability estimates, they were also good (0.96–0.98), with item separation ranging from 4.89 to 7.72, but person reliability was moderate (0.72–0.82). The PSIs were poor, ranging from 1.6 (emotional) and 1.76 (physical), indicating that persons were not separated into at least two levels. For the functional subscale, it was 2.13. An examination of the point measure correlations for all 25 items revealed that they were all positive, indicating that all items contributed to the measurement of the latent variable. This was also the case for point measure correlations of the subscales.
Item fit statistics are provided in Table 5. The MnSq infit and outfit statistics should be close to 1 with an acceptable range of (0.70–1.4), and they are reported as overfitting if <0.7 and underfitting if >1.4 (Bond and Fox 2015 [36]). To fit the model, infit and outfit reported as Z-STD statistics (standardised fit statistics) have an expected outcome of 0 with an acceptable range of ±2. Values exceeding +2 are reported as underfit and as overfit if they exceed −2 (Bond and Fox, 2015). All infit MnSqs were in the acceptable range except for underfit for item 22 (F), and it also had underfit Z-STD statistics. Infit Z-STD values were also underfitting for items 3 (P), 5 (P), 7 (F), and 24 (P). Infit Z-STD values were overfitting for items 2 (P), 12 (E), 13 (E), 14 (F), 15 (F), 16 (F), and 23(F). Outfit MnSqs were in the desired range except when underfitting in items 3 (P) and 24 (P), and overfitting in items 12 (E), 14 (F), 16 (F), and 18 (E). Outfit Z-STD values were also underfitting (>2) for items 1 (P), 3 (P), 7 (F), 17 (E), 20 (P), 21 (E), and 24 (P) and overfitting for items 12 (E), 13 (E), 14 (F), 15 (F), 16 (F), 18 (E), and 23 (F). When mean squares are acceptable, underfitting and overfitting Z-STD values can be ignored [19].
When examining the person fit statistics, 49 persons had some underfitting MnSqs or Z-STD scores. Twenty-nine persons had both underfitting infit and outfit MnSqs (>1.4). Overall, infit MnSq scores for 35 persons and 22 infit Z-STD scores were underfitting. Sixteen persons had both underfitting infit and outfit Z-STD scores. Overall, infit Z-STD scores were underfitting for 22 persons and the Z-STD outfit scores for 21 persons were underfitting. Infit statistics explain performance better because outfit statistics are sensitive to outlying scores. Too much variation in the responses results is underfitting (MnSq >1.4; Z-STD > 2), and this is the biggest threat to the measure because it can degrade the model (Bond and Fox 2015 [36]). We then examined the item and person fit statistics for each subscale.

3.2.1. Physical Subscale

Items 1, 2, 3, 4, 5, 11, 20, 24, and 25 comprised the physical subscale. All items had MnSq infit and outfit scores in the desired range except item 5 (infit, 1.46; outfit, 1.48), which was also underfitting for both the MnSq (5.05) and Z-STD (4.26) outfit scores. An overfitting Z-STD score (−2.23) was also evident on item 1 and on items 2 (−3.43) and 4 (−2.93), which also had underfitting outfit Z-STD scores (−2.45 and −2.45, respectively).
Fifty-one persons had at least one misfitting infit or outfit statistic. In total, 38 people had underfitting infit and outfit MnSq scores, 12 persons had underfitting Z-STD infit and outfit scores, and 12 persons had both infit and outfit MnSq and Z-STD scores that were underfitting.

3.2.2. Functional Subscale

Items 6, 7, 9, 10, 14, 25,16, 22, and 23 comprised the functional Subscale. All items had MnSq infit scores in the desired range except item 22 (1.41), and the Z-STD infit score was also underfitting (3.18). Items 7 and 9 had MnSq outfit scores (1.62 and 1.59, respectively) and overfitting Z-STD outfit scores (4.26 and 2.86, respectively). Item 6 had both overfitting infit (−2.02) and outfit (−2.05) Z-STD scores, as did item 23 with an infit Z-STD score (−5.03 and outfit Z-STD of −4.62), which was also underfitting on both the MnSq (5.05) and Z-STD (4.26) outfit scores. Overfitting Z-STD scores were also evident on item 1 (−2.23), item 2 (−3.43), and item 4 (−2.93), which also had underfitting outfit Z-STD scores (−2.45 and −2.45, respectively). Items 15 and 16 had overfitting infit Z-STD scores (−2.64 and −2.42, respectively).
Forty-three persons had at least one misfitting infit or outfit statistic. In total, 21 people had underfitting infit and outfit MnSq scores, 10 persons had underfitting Z-STD infit and outfit scores, and 10 persons had both infit and outfit MnSq and Z-STD scores that were underfitting.

3.2.3. Emotional Subscale

Items 8, 12, 13, 17, 28, 29, and 21 comprised the emotional subscale. All items had MnSq infit scores in the desired range, but item 21 had an underfitting outfit MnSq (1.44), and both the infit Z-STD score (3.06) and outfit (3.38) were underfitting. Item 8 had an overfitting Z-STD infit score (2.38), and both the infit and outfit Z-STD scores for item 12 (−3.96 and −3.46, respectively) and item 18 (−2.87 and −2.72, respectively) were overfitting.
Sixty-three persons had at least one misfitting infit or outfit statistic. Thirty-five persons had underfitting infit and outfit MnSq scores, five persons had both underfitting Z-STD infit and outfit scores, and five persons had both infit and outfit MnSq and Z-STD scores that were underfitting.

3.3. Differential Item Functioning

Differential Item Functioning (DIF) analysis enabled the examination of potential contrasting item-by-item profiles associated with the following: sex, age, diagnostic category, OD or no OD, FOIS score, and DHI severity score. The summary of the DIF analysis for all 25 items as the overall scale is presented in Table 6. Significantly different responses were most frequently observed for six P items, five F items, and one E item. Significant DIF for sex was observed on two items: 3 (P) and 24 (P); for age on item 4 (P); for diagnostic category on items 24 (P), 6 (E), 20 (P), 22 (F), and 23 (F); for OD vs. no OD on item 24 (P); for FOIS on items 1 (P), 6 (E), 15 (F), 16 (F), 17 (F), 20 (P), 22 (F), and 23 (F); and DHI severity on items 3 (P), 20 (P), and 25 (P).
Differential item functioning was then examined for each of the subscales. As with the overall scale, no items showed significant DIF for all variables; however, three P items (3, 24, and 25) showed DIF on sex, and three P items (1, 2, and 20) showed significant DIF on diagnostic category. On the physical subscale, significant DIF was also evident for age on item 4, for OD vs. no OD on item 24, for FOIS on item 11, and for DHI severity on item 11. For the functional subscale, DIF was evident only for age and DHI severity on item 7, for DHI severity only on item 10, and for FOIS score on items 9, 22, and 23. No DIF was evident on the emotional subscale except for OD vs. no OD and FOIS on item 17.

3.4. Dimensionality

The dimensionality of the overall scale of 25 items was examined using the principal component analysis (PCA) of the residuals (Table 7 and Table 8). Contrasts in the item residuals are examined for dimensions that are not explained by the Rasch dimension. The Rasch dimension explained 42.9% of the variance, with >40% indicating a strong measurement of dimension [19]. The examination of the explained variance showed that the item measures (22.3%) explained slightly more of the variance than the person measures (20.7%). However, the unexplained variance (57.1%) was greater than the explained variance. The raw variance explained by the items was only about three times the variance explained by the first contrast (7.8%), indicating a noticeable second dimension. The first contrast had an eigenvalue of 3.43, which is greater than the value (two eigenvalue units) confirming that there is a second dimension and the eigenvalue of the second contrast (2.11), explaining 4.8% of the variance, which is the smallest amount that could indicate the possibility of a third dimension. The PCA divided the items into two groups: one with the Rasch dimension items 1, 2, 3, 4, 11, 20, 24, and 25 from the physical (P) subscale and 17,19, 21 from the emotional (E) subscale and another with a second dimension with items 6, 7, 9, 10, 14, 15, 16, 22, and 23 from the functional (F) subscale and items 8, 12, 13, and 18 from the Emotional (E) subscale. This would suggest, based on the theoretical logic for QoL, that for people with dysphagia, QoL is affected by physical symptoms and the functional impact on daily life. However, the results related to the dimensionality of the DHI suggest that the emotional impact is intertwined with both physical symptoms (e.g., having an emotional response [fear] to choking) and also in response to the functional impact (e.g., having an emotional response [depression] to appearing in public). As indicated earlier, the point measure correlations were all in a positive direction, indicating that all items contributed to the measurement of the latent variable and should, therefore, be retained.
As presented in Figure 3, the person–item map showed that (a) there was a need for more easy and more difficult items, (b) that many people were not aligned against items, and (c) that there was very little item redundancy evident from items aligning at the same level. Items 8 (E), 14 (F), and 20 (P) aligned; 5 (P), 12 (E), 15 (F), and 24 (P) aligned; 2 (P), 7 (F), and 16 (F) aligned; 3 (P), 18 (E), and 23 (F) aligned; and items 10 (F) and 13 (E) aligned. However, of these, items 5 and 24 were both P items, so one was potentially redundant, and 7 and 16 were both F items, so one was potentially redundant. The other alignments can be explained because they are items belonging to differing subscales.
We also examined the dimensionality of each of the subscales separately. On the physical subscale, the Rasch dimension explained 38.4% of the variance, with persons and items explaining 18.5% and 20.3%, respectively. There was no evidence of a second dimension, with the first contrast variance being less than two eigenvalue units. On the functional subscale, 54.2% of the variance was explained by the Rasch dimension, with person and item variances of 28.2% and 26%, respectively, and no evidence of a second dimension. On the emotional subscale, 44.7% of the variance was explained by the Rasch dimension, with person and item variances of 23.8% and 20.9%, respectively, and no evidence of a second dimension. As with the overall measure, there was a need for more easy and more difficult items in each subscale. Many people were not aligned against items, and the only potential redundancies were in the physical subscale, with items 5 and 24 aligning at the same level (as observed in the overall scale) and items 1 and 2 aligning at the same level.
This process was then repeated with the removal of the most misfitting items—18 (I feel handicapped because of my swallowing (E)), 24 (I feel a strangling sensation when I swallow (P)), and 3 (my mouth is dry (P))—separately and then of combinations of items 3 and 24 and items 3, 18, and 24. Even though item 22 (I must eat another way (e.g., feeding tube) because of my swallowing problem (F)) was misfitting, it was the only difficult item and was therefore not removed in these analyses. No significant changes were evident, and all models still indicated a second dimension (Table 7 and Table 8) or measures of the two components of QoL impact, physical and functional, with emotional as a third component related to both physical and functional items. However, examining the person-item map revealed that the removal of both items 3 and 24 resulted in only items 7 and 16 showing redundancy.

4. Discussion

4.1. Summary Statistics

The summary statistics for item and person ability for all 25 items were good (i.e., high item and person reliability). When examining the subscales, item reliability estimates were good, but person reliability was moderate with poor person separation indices. As a result, people may not be separated into different levels (i.e., high versus low performers in relation to swallowing), supporting the need for more easy and more difficult items.

4.2. Rating Scale

When examining how the rating scale was used for the overall DHI, there was no disordering in the categories, and all fit statistics were within an acceptable range. However, step difficulty between the categories indicated a potential gap in the measure of the variable. An examination of the category fit statistics per subscale revealed no categories in the misfit range except for the emotional subscale, and gaps related to step difficulty in all three subscales were confirmed. Increasing the number of categories (i.e., response options) and providing clear descriptions of how categories differ from each other can help resolve these findings.

4.3. Person and Item Fit

Overall, more misfitting was evident in the Z-STD statistics, with only one underfitting MnSq infit statistic. Outfit MnSqs outside the acceptable range were mainly overfitting and not in contradiction to the outfit Z-STD scores, with overfitting being less of a threat to the model. However, care needs to be taken so that this is not misinterpreted as ‘the model is working better than expected’.
When using the scale as a whole, all items except item 22 had acceptable mean square infit statistics, and therefore, the underfitting and overfitting of the Z-STD scores are considered less important. Outfit MnSqs were overfitting for four items, but the outfit Z-STD scores were also overfitting. The overfitting of MnSq and Z-STD is less concerning than underfitting. Outfit statistics are unweighted, so they are often regarded as less important than infit statistics as they are more sensitive to outliers. Although they show that the data were more predictable than the model, they do not usually degrade the model. Further, although some items had an MnSq infit or outfit score outside the desired range, suggesting their removal, additional analyses assessing the measure’s dimensionality recommend that the measure may be improved as a two-dimensional model, with these questions retained but with different wording. Item 22 had infit MnSq and Z-STD scores that were underfitting, but this is likely due to it being about needing to use an alternative means of feeding (i.e., feeding tube). So, it should be retained as it would likely perform better with a larger sample that included more people using feeding tubes.

4.4. Differential Item Functioning

Differential item functioning analysis was used to examine the potential contrasting item-by-item profiles associated with sex, age, the presence or absence of a confirmed diagnosis of OD, medical diagnosis, FOIS, and DHI severity. Overall, the DIF results indicate good performance at an item level. Theoretically, DIF would be expected on most variables, but not for sex. The most obvious DIF was found for FOIS, suggesting a more optimal representation for the functional manifestation of swallowing problems. In contrast, the presence of dysphagia presented only with marginal DIF. The DIF domains age, medical diagnosis, and DHI severity also showed minor DIF, of which the limited DIF for age could be due to a limited distribution across age groups (i.e., 71.1% of participants were ≥60 years).

4.5. Dimensionality

The PCA of residuals performed to examine the dimensionality of the overall DHI measure indicated that the DHI consisted of two dimensions in contrast to the three dimensions as suggested by Silbergleit, Schultz [24]; each dimension consisted of items either from the functional or physical subscale, supplemented with items originating from the emotional subscale. The person–item dimensionality map indicated little item redundancy, but an obvious need to generate more easy and difficult items.
Because Rasch analyses could not confirm the three-dimensional nature of the DHI, it is recommended to avoid the use of subscales and consider only the full measure until dimensionality is addressed through future instrument redevelopment. During the redevelopment process, items from the emotional subscale may be distributed across the other two domains and/or reworded to reflect that physical and functional domains have an emotional element, rather than emotion being a separate dimension.

4.6. Future Recommendations

Failing to reproduce the three subscales suggests that the DHI does not meet content validity criteria. Future studies should focus on meeting criteria for all three aspects of content validity: (1) relevance (i.e., the degree to which all items of a measure are relevant for the construct of interest within a target population and purpose of use); (2) comprehensiveness (i.e., the degree to which all key concepts of the construct are included in a measure); and (3) comprehensibility (i.e., the degree to which items of a measure are easy to understand for respondents) [45].
The redevelopment of the DHI should also address the rephrasing and regrouping of misfitting items and the need for the generation of new, easy, and difficult items to improve the separation of people into different levels of low versus high performance (i.e., a low versus high degree of disability on patient’s quality of life). In supporting a two-dimensional model, all emotion items should be reworded to rather reflect an emotional response to a physical or a functional challenge. For example, item 12, ‘I feel depressed because I can’t eat what I want’ can be changed to ‘Not being able to eat what I want makes me feel depressed’, which then becomes a functional item with an emotional component. After the redevelopment of the DHI, the revised measure’s psychometric properties, including its dimensionality, must be determined again using CTT and IRT analyses in preferably the same, larger sample sizes, meeting current international standards for instrument development.
As solving the current psychometric issues surrounding the DHI is challenging, the development of a new measure that ensures good content validity may be preferred. During the instrument development of patient self-report measures for dysphagia, careful consideration should be given to which constructs should be targeted and whether including the constructs functional health status and health-related quality of life may suffice [7]. In this context, functional health status refers to the impact of dysphagia on the ability to perform tasks in multiple domains (including physical, social, role, and psychological functioning) and aims to quantify the symptomatic severity and (loss of) function due to dysphagia and/or treatment and the impacts on daily life as experienced by patients at a particular point in time [7]. HR-QoL refers to the unique personal perception of someone’s health, taking into account social, functional, and psychological issues [9].
A final consideration is the naming of the measure: the Dysphagia Handicap Index. When redeveloping the DHI, one may consider changing the measure’s name, as the term ‘handicap’ is perceived as outdated and offensive [46]. Instead, preference should be given to recommended language as suggested by, for example, the United Nations [46], which refers to ‘persons with disabilities’ and targeting patients’ capabilities over disabilities.

5. Conclusions

In general, previous studies using CTT to determine the psychometric properties of the DHI confirmed its validity and reliability [24]. However, our current findings using IRT seem to contradict the results of these CTT studies to some degree and highlight the need for continuous instrument development. The main weakness of the DHI is related to the failure to reproduce its three subscales, suggesting that the DHI does not meet the content validity criteria. The DHI has two dimensions, not three, and the items from the emotional subscale should be reworded and integrated with the functional and physical subscales. The redevelopment of the DHI should focus on meeting all criteria for good content validity, address the rephrasing and regrouping of misfitting items, and include new, easy, and difficult items to improve the separation of people into different levels of swallowing ability. Given the complexity of addressing these issues, the development of a new measure that ensures good content validity may be preferable.

Author Contributions

Conceptualisation: R.C., B.J.H. and R.S.; Methodology: R.C., A.V.J. and R.S.; Formal Analysis: R.C. and A.V.J.; Data Curation: B.J.H. and R.S.; Writing—Original Draft Preparation, R.C., A.V.J. and R.S.; Writing—Review and Editing: R.C., A.V.J., B.J.H. and R.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no funding.

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki and approved by the Local Medical Ethics Committee (approval code: G16.100; date: 17 January 2017) at the Leiden University Medical Center, The Netherlands.

Informed Consent Statement

Informed consent was obtained from all participants involved in the study.

Data Availability Statement

Data are not available upon request due to ethical restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Viñas, P.; Martín-Martínez, A.; Cera, M.; Riera, S.A.; Escobar, R.; Clavé, P.; Ortega, O. Characteristics and Therapeutic Needs of Older Patients with Oropharyngeal Dysphagia Admitted to a General Hospital. J. Nutr. Health Aging 2023, 27, 996–1004. [Google Scholar] [CrossRef] [PubMed]
  2. Rajati, F.; Ahmadi, N.; Naghibzadeh, Z.A.S.; Kazeminia, M. The global prevalence of oropharyngeal dysphagia in different populations: A systematic review and meta-analysis. J. Transl. Med. 2022, 20, 175. [Google Scholar] [CrossRef] [PubMed]
  3. Pacheco-Castilho, A.C.; Miranda, R.P.C.; Norberto, A.M.Q.; Favoretto, D.B.; Rimoli, B.P.; Alves, L.B.d.M.; Weber, K.T.; Santos, T.E.G.; Moriguti, J.C.; Leite, J.P.; et al. Dysphagia is a strong predictor of death and functional dependence at three months post-stroke. Arq. Neuro-Psiquiatr. 2022, 80, 462–468. [Google Scholar] [CrossRef] [PubMed]
  4. Speyer, R.; Balaguer, M.; Cugy, E.; Devoucoux, C.; Morinière, S.; Soriano, G.; Vérin, E.; Woisard, V. Expert consensus on clinical decision-making in the disease trajectory of oropharyngeal dysphagia in adults: An international Delphi study. J. Clin. Med. 2023, 12, 6572. [Google Scholar] [CrossRef] [PubMed]
  5. Jones, E.; Speyer, R.; Kertscher, B.; Swan, K.; Wagg, B.; Cordier, R. Health-related quality of life in oropharyngeal dysphagia. Dysphagia 2018, 33, 141–172. [Google Scholar] [CrossRef] [PubMed]
  6. Swan, K.; Speyer, R.; Heijnen, B.J.; Wagg, B.; Cordier, R. Living with oropharyngeal dysphagia: Effects of bolus modification on health-related quality of life—A systematic review. Qual. Life Res. 2015, 24, 2447–2456. [Google Scholar] [CrossRef] [PubMed]
  7. Speyer, R.; Cordier, R.; Denman, D.; Windsor, C.; Krisciunas, G.P.; Smithard, D.G.; Heijnen, B.J. Development of two patient self-reported measures on functional health status (FOD) and health-related quality of life (QOD) in adults with oropharyngeal dysphagia using the Delphi technique. J. Clin. Med. 2022, 11, 5920. [Google Scholar] [CrossRef]
  8. Speyer, R.; Cordier, R.; Farneti, F.; Nascimento, W.; Pilz, W.; Verin, E.; Walshe, M.; Woisard, V. White paper by the European society for Swallowing Disorders: Screening and non-instrumental assessment for dysphagia in adults. Dysphagia 2022, 37, 333–349. [Google Scholar] [CrossRef]
  9. Ferrans, C.E.; Zerwic, J.J.; Wilbur, J.E.; Larson, J.L. Conceptual Model of Health-Related Quality of Life. J. Nurs. Sch. 2005, 37, 336–342. [Google Scholar] [CrossRef]
  10. Cordier, R.; Joosten, A.; Clavé, P.; Schindler, A.; Bülow, M.; Demir, N.; Serel Arslan, S.; Speyer, R. Evaluating the psychometric properties of the Eating Assessment Tool (EAT-10) using Rasch analysis. Dysphagia 2017, 32, 250–260. [Google Scholar] [CrossRef]
  11. Swan, K.; Speyer, R.; Scharitzer, M.; Farneti, D.; Brown, T.; Woisard, V.; Cordier, R. Measuring What Matters in Healthcare: A Practical Guide to Psychometric Principles and Instrument Development. Front. Psychol. 2023, 18, 1225850. [Google Scholar] [CrossRef] [PubMed]
  12. Prinsen, C.A.; Vohra, S.; Rose, M.R.; Boers, M.; Tugwell, P.; Williamson, P.R.; Terwee, C.B. How to select outcome measurement instruments for outcomes included in a “Core Outcome Set”—A practical guideline. Trials 2016, 13, 449. [Google Scholar] [CrossRef] [PubMed]
  13. Speyer, R.; Kim, J.-H.; Doma, K.; Chen, Y.-W.; Denman, D.; Phyland, D.; Parsons, L.; Cordier, R. Measurement properties of self-report questionnaires on health-related quality of life and functional health status in dysphonia: A systematic review using the COSMIN taxonomy. Qual. Life Res. 2019, 28, 283–296. [Google Scholar] [CrossRef]
  14. Timmerman, A.A.; Speyer, R.; Heijnen, B.J.; Klijn-Zwijnenberg, I.R. Psychometric characteristics of health-related quality-of-life questionnaires in oropharyngeal dysphagia. Dysphagia 2014, 29, 183–198. [Google Scholar] [CrossRef] [PubMed]
  15. Cordier, R.; Speyer, R.; Schindler, A.; Hamdy, S.; Michou, E.; Heijnen, B.J.; Baijens, L.W.J.; Karaduman, A.; Swan, K.; Clave, P.; et al. Using Rasch analysis to evaluate the reliability and validity of the Swallowing Quality of Life questionnaire: An item response theory approach. Dysphagia 2018, 33, 441–456. [Google Scholar] [CrossRef] [PubMed]
  16. Cordier, R.; Speyer, R.; Martinez, M.; Parsons, L. Non-instrumental clinical assessments in oropharyngeal dysphagia: A systematic review on validity and reliability. J. Clin. Med. 2023, 12, 721. [Google Scholar] [CrossRef] [PubMed]
  17. Swan, K.; Speyer, R.; Brown, T.; Cordier, R. Psychometric properties of visuoperceptual measures of videofluoroscopic and fibre-endoscopic evaluations of swallowing: A systematic review. Dysphagia 2019, 34, 2–33. [Google Scholar] [CrossRef] [PubMed]
  18. Cappelleri, J.C.; Jason Lundy, J.; Hays, R.D. Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures. Clin. Ther. 2014, 36, 648–662. [Google Scholar] [CrossRef]
  19. Linacre, J.M. A User’s Guide to Winsteps Raschmodel Computer Programs: Program Manual 3.92.0; Mesa-Press: Chicago, IL, USA, 2016. [Google Scholar]
  20. Kean, J.; Bisson, E.F.; Brodke, D.S.; Biber, J.; Gross, P.H. An introduction to item response theory and Rasch analysis: Application using the eating assessment tool (EAT-10). Brain Impair. 2018, 19, 91–102. [Google Scholar] [CrossRef]
  21. Chen, A.Y.; Frankowski, R.; Bishop-Leone, J.; Hebert, T.; Leyk, S.; Lewin, J.; Goepfert, H. The development and validation of a dysphagiaspecific quality-of-life questionnaire for patients with head and neck cancer. Arch. Otolaryngol.-Head Neck Surg. 2001, 127, 870–876. [Google Scholar]
  22. McHorney, C.A.; Robbins, J.; Lomax, K.; Rosenbek, J.C.; Chignell, K.; Kramer, A.E.; Bricker, D.E. The SWAL–QOL and SWAL–CARE outcomes tool for oropharyngeal dysphagia in adults: III. Documentation of reliability and validity. Dysphagia 2002, 17, 97–114. [Google Scholar] [CrossRef]
  23. Belafsky, P.C.; Mouadeb, D.A.; Rees, C.J.; Pryor, J.C.; Postma, G.N.; Allen, J.; Leonard, R.J. Validity and reliability of the Eating Assessment Tool (EAT-10). Ann. Otol. Rhinol. Laryngol. 2008, 117, 919–924. [Google Scholar] [CrossRef]
  24. Silbergleit, A.K.; Schultz, L.; Jacobson, B.H.; Beardsley, T.; Johnson, A.F. The Dysphagia handicap index: Development and validation. Dysphagia 2012, 27, 46–52. [Google Scholar] [CrossRef] [PubMed]
  25. Hansen, T.; Kjaersgaard, A. Item analysis of the Eating Assessment Tool (EAT-10) by the Rasch model: A secondary analysis of cross-sectional survey data obtained among community-dwelling elders. Health Qual. Life Outcomes 2020, 18, 139. [Google Scholar] [CrossRef] [PubMed]
  26. Schindler, A.; de Fátima Lago Alvite, M.; Robles-Rodriguez, W.G.; Barcons, N.; Clavé, P. History and Science behind the Eating Assessment Tool-10 (Eat-10): Lessons Learned. J. Nutr. Health Aging 2023, 27, 597–606. [Google Scholar] [CrossRef]
  27. Hazelwood, R.J.; Armeson, K.E.; Hill, E.G.; Bonilha, H.S.; Martin-Harris, B. Relating Physiologic Swallowing Impairment, Functional Swallowing Ability, and Swallow-Specific Quality of Life. Dysphagia 2023, 38, 1106–1116. [Google Scholar] [CrossRef] [PubMed]
  28. Silbergleit, A.K.; Schultz, L.; Hamilton, K.; LeWitt, P.A.; Sidiropoulos, C. Self-Perception of Voice and Swallowing Handicap in Parkinson’s Disease. J. Park. Dis. 2021, 11, 2027–2034. [Google Scholar] [CrossRef]
  29. Ginocchio, D.; Ninfa, A.; Pizzorni, N.; Lunetta, C.; Sansone, V.A.; Schindler, A. Cross-Cultural Adaptation and Validation of the Italian Version of the Dysphagia Handicap Index (I-DHI). Dysphagia 2022, 37, 1120–1136. [Google Scholar] [CrossRef]
  30. Silva-Carvalho, I.; Martins, A.; Casanova, M.J.; Freitas, S.; Meireles, L. Cross-Cultural Adaptation and Validation of the European Portuguese Dysphagia Handicap Index. Dysphagia 2023, 38, 1072–1079. [Google Scholar] [CrossRef]
  31. Crary, M.A.; Mann, G.D.; Groher, M.E. Initial psychometric assessment of a functional oral intake scale for dysphagia in stroke patients. Arch. Phys. Med. Rehabil. 2005, 86, 1516–1520. [Google Scholar] [CrossRef]
  32. Mokkink, L.B.; Prinsen, C.; Patrick, D.L.; Alonso, J.; Bouter, L.M.; De Vet, H.; Terwee, C.B.; Mokkink, L. COSMIN Methodology for Systematic Reviews of Patient-Reported Outcome Measures (PROMs). User Manual. 2018, pp. 1–78. Available online: https://www.cosmin.nl/wp-content/uploads/COSMIN-syst-review-for-PROMs-manual_version-1_feb-2018-1.pdf (accessed on 1 April 2024).
  33. Linacre, J.M. Winsteps® Rasch Measurement Computer Program. 2016. Available online: https://www.winsteps.com/index.htm (accessed on 1 April 2024).
  34. Wright, B. Rating scale model (RSM) or partial credit model (PCM). Rasch Meas. Trans. 1998, 12, 641–642. [Google Scholar]
  35. Linacre, J.M. Investigating rating scale category utility. J. Outcome Meas. 1999, 3, 103–122. [Google Scholar] [PubMed]
  36. Bond, T.; Fox, C. Applying the Rasch Model: Fundamental Measurement in the Human Sciences, 3rd ed.; Routledge: London, UK, 2015. [Google Scholar]
  37. Dziewas, R.; Allescher, H.-D.; Aroyo, I.; Bartolome, G.; Beilenhoff, U.; Bohlender, J.; Breitbach-Snowdon, H.; Fheodoroff, K.; Glahn, J.; Heppner, H.-J. Diagnosis and treatment of neurogenic dysphagia—S1 guideline of the German Society of Neurology. Neurol. Res. Pract. 2021, 3, 23. [Google Scholar] [CrossRef] [PubMed]
  38. Baijens, W.J.; Walshe, M.; Aaltonen, L.-M.; Arens, C.; Cordier, R.; Cras, P.; Crevier-Buchman, L.; Golusinski, W.; Govender, R.; Grau Eriksen, J.; et al. European Society For Swallowing Disorders—Confederation Of European Otorhinolaryngology Head and Neck Surgery. White Paper: Oropharyngeal dysphagia in head and neck cancer. Eur. Arch. Oto-Rhino-Laryngol. 2021, 278, 577–616. [Google Scholar] [CrossRef] [PubMed]
  39. Rudler, F.; Pineton de Chambrun, G.; Lallemant, B.; Garrel, R.; Pouderoux, P.; Ramdani, M.; Caillo, L.; Reynaud, C.; Valats, J.C.; Blanc, P. Management of the Zenker diverticulum: Multicenter retrospective comparative study of open surgery and rigid endoscopy versus flexible endoscopy. Surg. Endosc. 2023, 37, 7064–7072. [Google Scholar] [CrossRef] [PubMed]
  40. Baijens, L.W.; Clavé, P.; Cras, P.; Ekberg, O.; Forster, A.; Kolb, G.F.; Leners, J.C.; Masiero, S.; Mateos-Nozal, J.; Ortega, O.; et al. European Society for Swallowing Disorders—European Union Geriatric Medicine Society white paper: Oropharyngeal dysphagia as a geriatric syndrome. Clin. Interv. Aging 2016, 11, 1403–1428. [Google Scholar] [CrossRef] [PubMed]
  41. Zheng, M.; Zhou, S.; Hur, K.; Chambers, T.; O’Dell, K.; Johns, M. Disparities in the prevalence of self-reported dysphagia and treatment among U.S. adults. Am. J. Otolaryngol. 2023, 44, 103774. [Google Scholar] [CrossRef]
  42. Otaka, Y.; Harada, Y.; Shiroto, K.; Morinaga, Y.; Shimizu, T. Early swallowing rehabilitation and promotion of total oral intake in patients with aspiration pneumonia: A retrospective study. PLoS ONE 2024, 19, e0296828. [Google Scholar] [CrossRef]
  43. Guilera, G.; Gómez-Benito, J.; Hidalgo, M.D.; Sánchez-Meca, J. Type I error and statistical power of the Mantel-Haenszel procedure for detecting DIF: A meta-analysis. Psychol. Methods 2013, 18, 553. [Google Scholar] [CrossRef] [PubMed]
  44. Linacre, J.M. Detecting multidimensionality: Which residual data-type works best? J. Outcome Meas. 1998, 2, 266–283. [Google Scholar]
  45. Terwee, C.B.; Prinsen, C.A.C.; Chiarotto, A.; de Vet, H.C.W.; Bouter, L.M.; Alonso, J.; Westerman, M.J.; Patrick, D.L.; Mokkink, L.B. COSMIN Methodology for Assessing the Content Validity of PROMs—User Manual, Version 1.0. COSMIN, Ed.; VU University Medical Center: Amsterdam, The Netherlands, 2018.
  46. United Nations. Disability Inclusive Language Guidelines; The United Nations Office at Geneva: Geneva, Switzerland, 2019. [Google Scholar]
Figure 1. Rasch analysis (item response theory): domains being evaluated.
Figure 1. Rasch analysis (item response theory): domains being evaluated.
Jcm 13 02331 g001
Figure 2. Rating scale validity.
Figure 2. Rating scale validity.
Jcm 13 02331 g002
Figure 3. Person–Item map.
Figure 3. Person–Item map.
Jcm 13 02331 g003aJcm 13 02331 g003b
Table 2. Participant demographics.
Table 2. Participant demographics.
Participant CharacteristicsOropharyngeal Dysphagia Group
(n = 188)
No Oropharyngeal Dysphagia Group
(n = 68)
Combined Groups
(N = 256)
Sexn (%)   
  Male103 (54.8%)33 (48.5%)136 (53.1%)
  Female85 (45.2%)35 (51.5%)120 (46.9%)
Age (≥18 years):    
  MN (SD)66.4 (13.8)61.8 (14.8)65.2 (14.2)
  Range18–9618–8818–96
Age group: n (%)   
  18–397 (3.7%)6 (8.8%)13 (5.1%)
  40–5942 (22.3%)19 (27.9%)61 (23.8%)
  60–6952 (27.7%)18 (26.5%)70 (27.3%)
  70–79 58 (30.9%)22 (32.4%)80 (31.3%)
  ≥8029 (15.4%)3 (4.4%)32 (12.5%)
Medical diagnosis: n (%)   
  Neurological disorders60 (31.9%)16 (23.5%)76 (29.7%)
  Head and Neck oncology 59 (31.4%)9 (13.2%)68 (26.6%)
  Other69 (36.7%)43 63.2%)112 (43.8%)
FOIS  -
All levels: Med (25; 75%)6 (5; 6)7 (7; 7) 
Per level: n (%)   
1. Nothing by mouth14 (7.4%)--
2. Tube dependent with minimal attempts of food or liquid12 (6.4%)--
3. Tube dependent with consistent oral intake of food or liquid6 (3.2%)--
4. Total oral diet of a single consistency11 (5.9%)--
5. Total oral diet with multiple consistencies, requiring special preparation or compensations44 (23.4%)--
6. Total oral diet with multiple consistencies without special preparation, but with specific food limitation58 (30.9%)--
7. Total oral diet with no restrictions111 (22.9%)--
DHI severity
(missing data = 13)
(n = 181)(n = 62)(N = 243)
1. No difficulty at all1 (0.6%)8 (12.9%)9 (3.7%)
2.8 (4.4%)8 (12.9%)16 (6.6%)
3.14 (7.7%)2 (3.2%)16 (6.6%)
4. Somewhat of a problem36 (19.9%)14 (22.6%)50 (20.6%)
5. 38 (21.0%)10 (16.1%)48(19.8%)
6. 50 (27.6%)15 (24.2%)65 (26.7%)
7. The worse problem you could have34 (18.8%)5 (8.1%)39 (16.0%)
Notes. FOIS = Functional Oral Intake Scale; DHI = Dysphagia Handicap Index; MN = Mean; Med = Median; SD = Standard Deviation.
Table 3. Category function.
Table 3. Category function.
CategoryN%Average
Measures
Infit MnSqOutfit MnSqAndrich Thresholds
0275943−13.521.051.10None
1198331−2.830.931.00−4.69
21656268.490.971.014.69
Note: Missing data = 2; 0.03%.
Table 4. Item and person summary statistics.
Table 4. Item and person summary statistics.
AnalysisScalesItem/
Person
RelSepPSI *Mean MeasureModel SEMnSqZ-STDMnSqZ-STD
1All 25 itemsPerson0.913.234.6445.603.561.01−0.081.04−0.03
  Item0.975.76-50.001.031.01−0.151.040.12
2PhysicalPerson0.761.762.6847.515.731.01−0.051.02−0.05
 ScaleItem0.964.89-50.001.051.00−0.211.020.07
3FunctionPerson0.822.133.1747.406.271.00−0.041.030.02
 ScaleItem0.987.72-50.001.161.02−0.161.03−0.25
4EmotionalPerson0.721.612.4844.166.951.01−0.041.040.00
 ScaleItem0.964.61-50.001.190.99−0.281.040.11
Notes. * PSI, Person Separation Index/Strata; PSI = [4 × Person Separation + 1]/3. A person strata of, “3” (the minimum level to attain a reliability of 0.90) implies that three different levels of performance can be consistently identified using the test for samples like that tested; Rel = reliability; Sep = separation; bold text and thicker lines reflect that each analysis generates both person and item statistics.
Table 5. Individual item fit statistics and principal component analysis for subscales.
Table 5. Individual item fit statistics and principal component analysis for subscales.
 All 25 ItemsPhysical SubscaleFunction SubscaleEmotional Subscale
 InfitOutfit InfitOutfit InfitOutfit InfitOutfit 
ItemsMnSqZ-STDMnSqZ-STDPTM Corr.MnSqZ-STDMnSqZ-STDPTM Corr.MnSqZ-STDMnSqZ-STDPTM Corr.MnSqZ-STDMnSqZ-STDPTM Corr.
10.96−0.551.242.060.460.83−2.230.97−0.280.56----------
20.80−2.680.86−1.350.590.75−3.430.79−2.450.64----------
31.283.381.534.410.461.081.011.151.650.58----------
41.121.461.050.540.590.99−0.080.94−0.650.66----------
51.252.941.171.420.551.465.051.484.260.51----------
60.99−0.050.94−0.530.68-----0.82−2.020.752.050.78-----
71.242.871.373.180.52-----1.383.791.624.260.62-----
81.161.871.181.350.55----------1.232.381.201.780.66
91.020.291.010.160.64-----1.312.791.592.860.66-----
101.010.180.99−0.080.64-----1.111.191.100.790.71-----
111.091.091.030.280.631.101.170.96−0.350.66----------
120.70−4.180.66−3.170.69----------0.69−3.960.68−3.460.78
130.75−3.480.70−3.210.72----------1.030.391.161.510.72
140.80−2.650.69−2.750.67-----0.97−0.260.90−0.670.69-----
150.84−2.050.76−2.170.67-----0.78−2.640.75−1.930.74-----
160.76−3.310.69−3.230.71-----0.80−2.420.77−1.910.76-----
170.90−1.161.332.240.55----------0.86−1.561.010.160.70
180.70−4.350.68−3.430.72----------0.77−2.870.75−2.720.78
191.060.681.040.280.52----------1.050.561.010.130.65
201.060.771.302.200.470.87−1.650.92−0.760.60----------
211.151.691.352.400.46----------1.303.061.443.380.59
221.483.901.180.930.46-----1.413.181.261.030.56-----
230.81−2.650.74−2.720.71-----0.61−5.030.53−4.620.81-----
241.252.881.543.960.441.091.151.151.510.55----------
250.94−0.691.010.100.560.78−2.930.79−2.350.65----------
Notes. MnSq values outside the acceptable range of 0.7–1.4 and outfit Z-STD values that exceed ±2 are interpreted as not fitting the Rasch model [36]; PTM Corr. = point measure correlations; values that are in bold are outside the acceptable range and do not fit the Rasch model.
Table 6. Summary of DIF analysis.
Table 6. Summary of DIF analysis.
 SexAgeOD vs. No OD
ItemsMantel–Haenszel Prob.Prob.DIF Contrast (Effect Size) &Summary DIF Chi-SquaredProb.DIF Contrast (Effect Size) Mantel–Haenszel Prob.Prob.DIF Contrast (Effect Size) #
10.02190.88242.141.72180.7866−4.570.31650.5737−4.98
20.41320.52041.980.48910.9746−1.150.25100.6164−1.85
34.86690.0274 *6.85 *5.61280.2295−3.570.00020.9897−4.35
40.20140.65360.2212.45670.0142 *9.19 *0.22110.6382−2.84
50.03120.85970.004.52050.339514.080.34490.5570−1.54
60.60140.43800.496.80330.146211.801.26730.26036.47
71.77150.1832−3.826.72150.1509−4.571.44110.23001.14
80.00090.9761−0.552.58510.6291−4.770.20770.64860.61
91.12160.2896−1.855.12700.27400.460.31660.5737−0.88
100.05970.8069−0.488.01700.090613.020.43470.50971.44
110.10590.7449−1.146.26380.17987.120.01290.90952.56
120.84680.3575−1.372.38960.66410.921.88110.17020.00
130.03730.84690.661.25580.86881.250.83470.36092.89
140.67850.4101−0.923.46260.4830−4.260.05360.81704.54
150.48240.4874−1.911.48460.82924.670.83280.36150.34
160.11010.7400−0.448.05180.089413.900.06970.79184.06
170.06230.8029−0.565.17750.2690−11.161.80240.1794−5.50
182.57790.1084−1.362.99140.5588−5.210.00910.92393.73
191.40010.2367−4.453.22930.5197−4.072.06380.1508−4.15
200.77190.37961.414.72190.3165−8.170.03330.8553−5.26
210.94960.32980.226.66180.1544−8.412.36330.12421.72
220.50810.4760−3.129.06810.05927.283.00120.083217.38
230.00640.9363−0.695.48130.240812.562.96630.08509.85
245.77980.0162 *7.22 *6.63050.1563−13.216.55390.0105 *−10.13 *
250.96150.3268−1.437.20090.1253−10.650.01560.9005−2.91
 Diagnostic CategoryFOISDHI Severity
ItemsSummary DIF Chi-SquaredProb.DIF
Contrast (Effect Size) +
Summary DIF Chi-SquaredProb.DIF Contrast (Effect Size)$Summary DIF Chi-SquaredProb.DIF Contrast (Effect Size) £
15.70280.0566−5.4019.59470.0033 *−8.0015.65030.0285 *−12.13
23.39060.1807−3.054.16680.6540−0.716.75060.4552−10.05
32.25540.3202−0.0820.13500.0026 *−7.7225.61520.0006 *−10.82
42.09010.34803.798.93740.17691.3011.20110.1300−8.03
51.25380.5312−0.094.73730.57784.887.91770.3398−0.53
66.35530.0408 *4.3027.98190.0001 *9.576.87070.4423−4.19
71.87090.3888−3.364.75940.5749−4.099.45480.2215−23.96
80.09160.9571−0.807.10790.31082.674.83440.6801−0.53
93.68380.1559−2.9412.62670.0493 *3.364.01740.777715.13
100.22330.89590.978.44970.20689.099.13000.243315.67
111.29110.5213−3.2110.24730.11455.365.44490.60577.12
123.75450.15052.856.86500.3333−0.516.88430.44093.19
131.86760.38942.093.78560.705611.786.80970.448815.67
140.92410.62780.963.17110.7870−4.4212.27440.091813.78
151.99810.3646−1.0215.31850.0179 *−2.440.39430.2990−4.65
161.25850.52992.7914.85290.0214 *−1.729.42840.223211.82
171.92940.3775−2.1012.64290.0490 *−9.873.49940.8353−2.16
180.99430.60592.286.62580.35661.2911.63050.113315.68
190.01770.99200.269.52480.1460−3.013.60100.8244−11.71
2016.88160.0002 *−0.3622.10330.0012 *4.4020.18180.0052 *15.68
210.50440.7769−1.058.68270.1920−7.738.20050.3151−16.65
2210.70630.0046 *−1.4356.85430.0000 *0.8811.76230.1086−14.55
237.78380.0199 *6.4531.68200.0000 *8.139.94220.19179.80
2430.88350.0000 *−0.7625.50900.0003 *−0.968.26630.3096−0.53
251.60220.4453−1.1110.94120.0901−2.3014.82090.0383 *3.19
Notes. & Sex (male reference group); 18–39 vs. 40–59 vs. 60–69 vs. 70–79 vs. ≥80 (18–39 reference group); # OD status (OD reference group); + neurological disorders vs. head and neck oncology vs. other disorders (neurological disorders reference group); $ FOIS (nothing by mouth reference group); £ DHI severity (Level 1 reference group); items in bold with the sign * denotes items with p < 0.05 and effect size (DIF contrast) > 0.5.
Table 7. Standardised residual variance.
Table 7. Standardised residual variance.
 All 25 ItemsPhysical ScaleFunction ScaleEmotional Scale
VarianceEigenvalueObserved (%)Expected (%)EigenvalueObserved (%)Expected (%)EigenvalueObserved (%)Expected (%)EigenvalueObserved (%)Expected (%)
Total raw variance in observations43.8010010014.7010010019.65100.0100.012.66100.0100.0
Raw variance explained by measures18.8042.942.75.7038.838.410.6554.254.25.6644.743.9
Raw variance explained by persons9.0520.720.62.7218.518.35.5328.228.23.0223.823.4
Raw variance explained by items9.7522.322.22.9820.320.15.1226.026.02.6420.920.5
Table 8. Standardised residual variance.
Table 8. Standardised residual variance.
 Unexplained VarianceRaw Unexplained Variance (Total)1st
Contrast
2nd
Contrast
3rd
Contrast
4th
Contrast
5th
Contrast
Full scaleEigenvalue25.03.42.11.81.61.3
Observed (%)57.17.84.84.13.73.0
Expected57.313.78.47.26.45.2
Physical subscaleEigenvalue9.01.71.41.31.11.0
Observed (%)61.211.79.69.07.67.0
Expected (%)61.619.115.614.712.411.4
Function subscaleEigenvalue9.01.91.51.21.01.0
Observed (%)45.89.47.46.25.35.2
Expected (%)45.820.616.213.611.511.4
Emotional subscaleEigenvalue7.01.71.21.21.11.0
Observed (%)55.313.69.79.18.57.9
Expected (%)56.124.617.616.415.414.3
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cordier, R.; Joosten, A.V.; Heijnen, B.J.; Speyer, R. A Psychometric Evaluation of the Dysphagia Handicap Index Using Rasch Analysis. J. Clin. Med. 2024, 13, 2331. https://doi.org/10.3390/jcm13082331

AMA Style

Cordier R, Joosten AV, Heijnen BJ, Speyer R. A Psychometric Evaluation of the Dysphagia Handicap Index Using Rasch Analysis. Journal of Clinical Medicine. 2024; 13(8):2331. https://doi.org/10.3390/jcm13082331

Chicago/Turabian Style

Cordier, Reinie, Annette Veronica Joosten, Bas J. Heijnen, and Renée Speyer. 2024. "A Psychometric Evaluation of the Dysphagia Handicap Index Using Rasch Analysis" Journal of Clinical Medicine 13, no. 8: 2331. https://doi.org/10.3390/jcm13082331

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop