1. Introduction
The item response theory (IRT) consists of a set of psychometric models to develop and refine questionnaires measures (
Embretson and Reise 2000), being useful to evaluate questionnaires on health field and psychometric evaluation in general, searching more detailed information to ameliorate these instruments (
Valentini and Laros 2011). No other study has applied these analyses to the Spiritual Needs Questionnaire.
The Spiritual Needs Questionnaire (SpNQ) measures psychosocial, existential, and spiritual needs in clinical contexts. Religiosity/spirituality plays a vital role in facing the consequences and daily life of various chronic diseases as a coping strategy, improving the quality of life, and even confirming the purpose of living of those patients (
Büssing et al. 2018). SpNQ (
Büssing et al. 2010) evaluates the intensity of individuals’ spiritual, existential, and psychosocial needs; the respondents indicate whether or not there is a specific need within four dimensions (social, emotional, existential, and religious), and how strong it is, using a 4-step scale. It differentiates four main dimensions, i.e., religious needs, existential needs, inner peace needs, and giving/generativity needs (
Büssing et al. 2010,
2018). The SpNQ was validated into Portuguese (
Valente et al. 2018) in a sample of HIV+ patients, presenting internal consistency ranging from 0.51 to 0.83.
SpNQ has already been translated and validated in 12 studies. However, the psychometric analyses performed so far were based only on the classical theory of tests (CTT), based on exploratory factor analysis and internal consistency analysis, all conventional quantitative approaches to test the reliability and validity of a scale based on its items (
Cappelleri et al. 2014;
Sartes and Souza-Formigoni 2013). Differently, the present study sought to evaluate the SpNQ items from the perspective of IRT, in order to obtain evidence of complementary validity of the characteristics of the items that make up the instrument—in this case, in Portuguese—to answer this main question: Do the items of the SpNQ questionnaire validated in Portuguese possess appropriate psychometric qualities to discriminate between respondents as to the probability of marking one answer and not another, in the same item, showing whether or not the questionnaire is biased towards a pattern of responses desired by the researcher?
The SpNQ was validated in different countries and different socio-cultural contexts of its various applications would benefit from the results of this analysis, through the calculation of the guarantee of accuracy and differentiation of the patients’ responses.
2. The Item Response Theory
The item response theory (IRT)—or latent trace theory—refers to a contiguous of mathematical models that considers the item as the basic unit of analysis, representing the probability of an individual to provide a particular response to an item as a function of the item’s parameters and the respondent’s ability (or trace) (
Primi et al. 2014). In general, the IRT allows the measurement of specific characteristics of the individuals to be evaluated (
Andrade et al. 2000) and verifies whether a given questionnaire presents desirable and valuable psychometric qualities of discrimination between respondents.
The IRT offers the performance evaluation of the individual faced with the item (whether it is the behavior or the effect observed directly). This theory aims to fill some limitations of the classical theory of tests (CTT); the main one is that the model to make a scale is based only on the results obtained with the questionnaire as a whole. However, the IRT uses the item as a fundamental unit for analysis. Secondly, for IRT, the measures depend on the sample of individuals who answered the questionnaire, therefore being valid only for that specific sample, or another similar one (
Embretson and Reise 2000). In addition, IRT considers that different tests with different indexes of difficulty and discrimination generate different results for the same individuals.
For IRT, if two different tests measure the same construct, the results are not expressed on the same scale, which jeopardizes a direct comparison of them. So, through the latent trace estimated on this theory, there is the bonus of possible comparisons between the latent trace of respondents from distinct samples when submitted to the same test or different ones (
Embretson and Reise 2000).
Another point is that of the trustworthiness of the evaluation because CTT supposes that two tests applied to the same sample must produce valid and identical scores, as well as its variances (
Pasquali and Primi 2003;
Pasquali 2009). Finally, CTT also considers that the variance in the measure of the errors, for all respondents, is the same (
Sartes and Souza-Formigoni 2013). Differently, IRT observes distinct levels of precision for each item per se, according to specific latent traces levels.
The way the response is offered (i.e., its causes) is also considered, taking into account the set of variables and the intensity of the latent trace that is present in that individual. In this way, conclusions are not tied to the test or questionnaire in general, but to each item that constitutes it. Its analysis derives in a group of valid items, allowing the elaboration of numerous tests with each item that composes the questionnaire (
Andrade et al. 2000;
Lima et al. 2019).
The latent trace theory assumes that the items of a test are a behavioral representation given by individuals in response to one or more of their latent traits. In other words, it considers that a psychic process causes the behavior in the questionnaire response. Regarding latent traits, two primary axioms are used: (1) the prediction of the performance of the subject in the item (task) is considered by a grouping of latent traits (aptitudes, abilities, or trace), which are identified by the Greek letter theta (θ). The performance itself represents the effect, and the latent traits are its cause. (2) the item characteristic curve (ICC) refers to a mathematical function of the relationship between test performance and the latent trait of the individual who answered it. Using the ICC, it is observed that individuals with higher latent trace will are more likely to endorse or hit an item (for a dichotomous case or a specific category of an item, in a polytomic response) between individuals with a certain level of that attribute (
Cappelleri et al. 2014).
For polytomic-type items, Samejima’s Graded Response Model (SGRM) (
Samejima 1969) offers two parameters: discrimination (
a) and difficulty (
b)—which may vary from item to item. Discrimination refers to the power of the item to differentiate subjects with different aptitudes, considered satisfactory values above 0.60 (
Nakano et al. 2015). The difficulty is the level of theta (latent traces) required for an individual to mark specific categories of response. The use of SGRM is appropriate when we have instruments with categorical and orderly response items, and not all items need to have the same number of response categories (
Lima et al. 2019).
In addition to the breakdown and difficulty of the items, the model assesses the amount of information on the items (accuracy), information curves, and characteristic curves of the items (CCIs). The CCIs are constructed from the number of response categories (k) of each item and the parameters of discrimination and difficulty. In the SGRM model, there are (k − 1) response thresholds (thresholds) for each item, i.e., the meeting between two categories. Each threshold corresponds to the measure of the difficulty of the item (parameter b). For example, on a Likert scale of four thresholds (b1, b2, b3, and b4), the first (b1) represents the amount of probable theta for an individual to choose category 2 instead of category 1. The inclination of each curve depends on the parameter “a” (discrimination).
The IRT reports on the accuracy of the measure from a different perspective from CTT. It considers that the accuracy of the measure varies according to the latent trace obtained and items’ quality presented in the region near theta. Thus, a test composed of items that present good psychometric properties in higher areas of theta will better evaluate people with higher latent trace levels in this area (
Nakano et al. 2015).
The IRT also offers the item information function (IIF) through the item information curve (IIC), which shows the amount of information (ability to differentiate between respondents) that the item has for different latent trace levels. The IIF evaluates the accuracy of the measurement of that item, considering the differences between respondents with different levels of the underlying trace (
Cappelleri et al. 2014), showing how well the items represent theta (
Pasquali 2007).
The IRT has been widely used to evaluate questionnaires applied in the health area, in the areas of psychometry and education, in marketing, surveys, and cognitive diagnoses. In each of them, the items in the questionnaires can be calibrated to fit the mathematical model, considering the individual scores of the respondents (
Van Der Linden 2016).
In the IRT, the default error is inversely proportional to the concept of information function (
Nakano et al. 2015). The item and test information functions are relevant as they are a viable alternative to the classical concepts of standard accuracy and error. The advantage is that the default error can be represented at any chosen latent trace level, thus determining the accuracy at any theta level (
Pasquali 2007).
4. Results
An analysis of the SpNQ items was performed from the item response theory with the model of the Samejima’s Graded Responses Model. The parameters of the items are in
Table 1 below.
The items in the Religious Needs component showed discrimination between 0.66 (item 26. Transmit your own life experience to other people) and 2.57 (item 19. Have someone prays for you). All items presented satisfactory discrimination (>0.60). It is also observed that the items covered a large portion of theta, given the b values (thresholds) between −3.0 (item 20 b1) and 1.46 (item 26 b3). Items 14 (Hand over, donate something of yours) and 26 presented a higher level of difficulty since higher theta values are required to agree with category 3 (very strong). The item with the largest area of information was the 19, and the general information of this component was 18.06.
The breakdown of the Inner Peace and Family Support Needs component ranged from 0.30 (item 25. Feel connected with your family) to 3.03 (item 7. Remain in a place of stillness and peace). Item 25 discriminated individuals in an unsatisfactory way (<0.60). The b values ranged from −6.5 (item 25 b1) to 1.90 (item 25 b3). Item 25 proved to be more difficult for individuals due to the high theta value required to choose category 3. The item with the most extensive information area was 7 (6.29), and the component information was 14.7.
The items in the Existential Needs component offered discrimination between 0.53 (item 10. Finding meaning in disease and/or suffering) to 2.89 (item 12. Talking to someone about the possibility of life after death). Only item 10 showed low discrimination. It was also found that the items covered a large portion of theta, given the b values (thresholds) ranging from −2.01 (item 2 b1) to 3.0 (item 10 b3). All items presented great difficulty, given the high theta values required. The individuals in item 2 did not endorse category 4. The item with the largest area of information was 12 (5.42), and general component information was 7.89.
The Social Recognition Needs component showed discrimination between 0.88 (item 22. Reading religious/spiritual books) and 1.69 (item 3. That someone from your religious community (ex.: pastor, priest) would take care of you). All items adequately discriminated individuals. The b values ranged from −0.84 (b1 of item 21) to 1.68 (b3 of item 3). Items 3 and 22 revealed greater difficulty: item 3 presented a larger area of information, and the total factor information was 6.20.
The fifth and final factor, Time Domain Needs component, presented high discrimination between 1.08 (item 5. Solving “open” aspects, outstanding problems in your life), and 3.75 (item 4. Reflect on your past). The b values ranged from −0.62 (b1 of item 4) to 2.14 (b3 of item 5), with item 5 being the most difficult. The most informative item was 4 (9.45), and total information was 11.07.
Next, we analyzed the total information curves of items (TICs) and the item characteristic curves (ICCs) separately by component.
From the total information curves of the items (TICs) of the Religious Needs component (
Figure 1), the items with the most information were the 14, 18, 19, and 20; also, they were more accurate to estimate intermediate theta levels (in the order of zero). Items 13, 15, 23, and 26 were less accurate to estimate theta levels.
The categories of items 18, 19, 20, 23, and 26 (
Figure 2) differentiated along with the levels of theta according to Standard 3 (more desirable) in which all categories have a higher probability at a specific point of theta. On the other hand, items 13, 14, and 15 presented a higher probability of extreme responses 1 (0 = no) or 4 (3 = very strong), and only one of the intermediate categories represented Standard 2.
TICs of the items of the Inner Peace and Family Support Needs component (
Figure 3) showed that the 6, 7, and 8 items had more information; also, they were more accurate to estimate theta levels between −3 and 0. Items 13, 15, 23, and 26 were less accurate to estimate theta levels.
From the ICCs of the Inner Peace and Family Support Needs component (
Figure 4), the categories of item 6 were differentiated along with the levels of theta according to Standard 3 (more desirable) in which all categories have a higher probability at a specific point of theta. On the other hand, the other items 7, 8, 25, 28, and 30 presented a higher probability of extreme responses 1 (0 = no) or 4 (3 = very strong), and only one of the intermediate categories, representing Standard 2. Item 25 presented a higher probability of category 4 (3 = very strong).
From the information curves of the items in the Existential Needs component (
Figure 5), item 12 represented the one with more information, more accurate to estimate intermediate theta levels. On the other hand, items 2, 10, and 11 presented low information.
The ICCs of the Existential Needs component (
Figure 6) showed that categories of item 2 and 11 were differentially distributed along with theta levels according to Standard 3 (more desirable) in which all categories have a higher probability at a specific point of theta. In item 2 (Talk to others about your fears and concerns), particularly, category 3 (very strong) was not marked by individuals. Items 10 and 12 presented a higher probability of extreme responses 1 (0 = no) or 4 (3 = very strong), and only one of the intermediate categories, representing Standard 2.
From the total information curves of the items in the Social Recognition Needs component (
Figure 7), all items showed peaks of information below 1, items 3 and 21 being the most accurate to estimate intermediate theta levels. Item 22 was less accurate.
From the ICCs of the Social Recognition Needs component (
Figure 8), it is observed that items 3 and 21 presented Standard 2, a higher probability of extreme responses 1 (0 = no) or 4 (3 = very strong), and only one of the intermediate categories, representing Standard 2. Item 22 was Standard 1 (less desirable) and had a higher probability of extreme responses.
For Time Domain Needs, item 4 was more accurate to estimate intermediate theta levels, while item 5 presented low information (
Figure 9).
From the ICCs of the component Time Domain Needs (
Figure 10), item 4 presented Standard 3 because there were probabilities that all categories were endorsed along with the levels of theta. In item 5 only one intermediate category was more likely, representing Standard 2.
5. Discussion
The evaluation of spiritual needs by SpNQ has proved valuable in varied cultural contexts: China (
Büssing et al. 2013), Croatia (
Glavas et al. 2017), Poland (
Büssing et al. 2015), Indonesia (
Nuraeni et al. 2015), Iran (
Nejat et al. 2016), Brazil (
Valente et al. 2018), and Lithuania (
Riklikienė et al. 2019). The instrument has been tested in different samples for disease: chronic pain (
Büssing et al. 2009;
2013;
2015, p. 11), post-traumatic stress (
Glavas et al. 2017), cancer (
Büssing et al. 2010;
Riklikienė et al. 2019) and HIV+ patients (
Valente et al. 2018;
Silva et al., forthcoming). This study verified the psychometric properties of SpNQ items through item response theory (IRT), an innovative contribution to the SpNQ validity studies, considering the relevance of questionnaires evaluating spiritual needs and their associated constructs.
The application of IRT by Samejima’s Graded Response Model (SGRM) allowed for the more detailed investigation of the forces and fragilities of the items that make up the SpNQ, providing instrument accuracy indices, knowing the discrimination and difficulty of the items, besides the evaluation of the patterns of the characteristic curves of each item.
This theory considers that the items that offer the largest area of information also present broader parameters of discrimination. The higher the discrimination, the greater is the ability to differentiate individuals as theta changes, providing greater accuracy (
Embretson and Reise 2000). The results of this work showed satisfactory parameters for the SpNQ: most of the items were discriminatory (
a > 0.60). There is now evidence of an adequate ability to distinguish individuals located in different regions from the latent traits investigated.
The Religious Needs component indicated items 14 (Give up, give something of yours), 18 (Pray with someone), and 19 (Having someone to pray for you) as the most discriminatory and precise to predict this latent trait. Item 26 (Transmitting your own life experience to other people) was more difficult to be endorsed by these HIV+ patients, probably because of the stigma and prejudice regarding the personal experiences of HIV-positive people (
Bennett et al. 2016).
The component Inner Peace and Family Support Needs had items 6 (Have more contact with the beauty of nature), 7 (Stay in a place of stillness and peace), and 8 (Find inner peace) with the most satisfactory parameters. Item 25 (Feeling connected with your family) presented greater difficulty of endorsement, probably because of the distance between the family and the patient, and the subsequent isolation (
Maposse and Seidl 2019). This item was neither discriminatory nor precise, in isolation, to predict levels in the evaluated construct. Thus, it turned out to be unsatisfactory.
Regarding the Existential Needs component, the most reliable item was 12 (Talk to someone about the possibility of life after death). Item 10 (Finding sense in disease and/or suffering) has not proved reliable to discriminate between different levels of Existential Needs. All items of this component showed great difficulty because they refer to questions such as the meaning of life and death, strongly associated with HIV and other diseases.
In the Social Recognition Needs component, items 3 (That someone from your religious community (e.g., pastor, priest) take care of you) and 21 (Participate in a religious ceremony (e.g., mass, worship) were the best evaluated. Item 22 (Reading religious/spiritual books) presented greater difficulty of endorsement by these participants. Considering that only 16 out of 157 respondents indicated to participate actively in some religion, it is understandable.
Finally, the Time Domain Needs component offered item 4 (Reflect on your past) as the most reliable to correctly predict the level of this latent trait and adequately discriminate individuals. Item 5 (Solving “open” aspects, outstanding problems in your life) presented greater difficulty of endorsement in this sample.
In general, the items reflected satisfactory discrimination and breadth in theta ranges. The questionnaire offers items of varying difficulty, capable of greater accuracy for intermediate needs levels. The ICCs showed a satisfactory pattern for most of the items. Still, the most common pattern was the higher probability of extreme responses (not or very strong), which was expected, as SpNQ only presents the alternative of total disagreement with the item statement. Otherwise, one can agree with the item, but only with three degrees of concordance intensity to choose from.
These results are not conclusive and suggest other studies applying SpNQ for comparison should be conducted. It is essential to detect possible group biases through the differential functioning of the items within the theory. Other IRT resources can be applied in studies with SpNQ, especially the construction of people-item map (person-item map) for the development of interpretative standards based on the items (
Cappelleri et al. 2014;
Primi et al. 2014).
It was possible to find weaknesses in the questionnaire explicitly applied in the HIV+ population: items with low discrimination and precision indices were discovered, appearing as the less indicated to predict responses considering the difference between the person’s theta and the item’s intensity. In such cases, people with high theta value could endorse categories of responses not corresponding to their latent trait level in spiritual needs (
Linacre 2015).
This study provided advancements on the validity of the SpNQ. Although the results were satisfactory, this study was not exempt from limitations: the reduced sample size; the specific disease of the sample could have brought biases to the results analyzed; the lack of similar information about this population in Brazil; and the insufficient national representativeness of the sampling in these only two studies that applied the questionnaire (so far). Future studies are necessary for other Brazilian regions, and with a higher quantity of respondents, for results comparisons.