Next Article in Journal
On Moments of Gamma—Exponentiated Functional Distribution
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Nonparametric Statistical Approach to Content Analysis of Items

by
Diego Marcondes
1,*,† and
Nilton Rogerio Marcondes
2,†
1
Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo 05508-090, Brazil
2
Faculdade de Psicologia e de Ciências da Educação, Universidade de Coimbra, 3000-115 Coimbra, Portugal
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Stats 2018, 1(1), 1-13; https://doi.org/10.3390/stats1010001
Submission received: 6 December 2017 / Revised: 23 January 2018 / Accepted: 25 January 2018 / Published: 1 February 2018

Abstract

:
In order to use psychometric instruments to assess a multidimensional construct, we may decompose it into dimensions and, in order to assess each dimension, develop a set of items, so one may assess the construct as a whole, by assessing its dimensions. In this scenario, content analysis of items aims to verify if the developed items are assessing the dimension they are supposed to by requesting the judgement of specialists in the studied construct about the dimension that the developed items assess. This paper aims to develop a nonparametric statistical approach based on the Cochran’s Q test to analyse the content of items in order to present a practical method to assess the consistency of the content analysis process; this is achieved by the development of a statistical test that seeks to determine if all the specialists have the same capability to judge the items. A simulation study is conducted to check the consistency of the test and it is applied to a real validation process.

1. Introduction

Psychometric instruments are built in order to assess psychological constructs that cannot be operationally defined and, consequently, cannot be objectively assessed, such as multidimensional constructs that, according to [1], consist of a number of interrelated attributes or dimensions and exist in multidimensional domains. In order to develop a psychometric instrument to assess a multidimensional construct, a set of items, that assess a dimension, is developed for each one of its dimensions in furtherance of assessing the construct as a whole. The validation process of such an instrument must guarantee that each item assesses its dimension correctly according to desirable characteristics such as reliability and trustworthiness [2].
As psychometric instruments play an important role in researches in the areas of psychology and education, it is necessary that they are thoroughly developed and validated, so that no erroneous results are obtained by their application. The validity of an instrument is divided into four categories: predictive validity, concurrent validity, construct validity and content validity. The first two of these may be considered together as criterion-oriented validation processes [3]. Predictive validity is studied when the instrument assesses a correlated construct to the criterion, providing a prediction for it, and concurrent validity is studied when the instrument is proposed as a substitute for another [3]. The study of construct validity is necessary when the result of the instrument is the measure of an attribute or a characteristic that is not operationally defined, so an instrument is valid when it is possible to determine which construct accounts for the variance of its performance. Furthermore, content validity is established by showing that the instrument items are a sample of a universe in which the investigator is interested and is ordinarily established deductively, by defining a universe of items and sampling systematically within this universe to build the instrument [3]. Another definition for content validity is that it is the degree to which elements of an assessment instrument are relevant to and representative of the targeted construct for a particular assessment purpose [2]. See [4] for more details on instrument validity and [5,6] for practical examples of validation processes.
A list consisting of thirty-five procedures for content validation was proposed by [2]. Amidst these procedures are to match each item to the dimension of the construct that it assesses and request the judgement of specialists in the construct, also called judges, about the developed items. The accomplishment of these procedures is imperative to verify if the developed items are a sample of the universe that the instrument aims to assess. These procedures, components of the theoretical analysis of items, are subjective as they rely on the personal opinions of specialists and researchers. Indeed, the theoretical analysis of items is done by judges and aims to establish the comprehension of the items (semantic analysis) and their pertinence to the attribute that they propose to assess.
This paper aims to propose a nonparametric statistical approach based on the Cochran’s Q test to content analysis of items in furtherance of assessing its consistency and reliability. Therefore, our approach does not seek to establish the validity of the instrument, but rather assess the consistency of the content analysis process, so that its rule about the instrument may be trusted. Thus, this approach must be applied among other instrument validation methods, quantitative and qualitative, e.g., semantic analysis, pretrial and factorial analysis, in order to ensure the reliability, consistency, validity and trustworthiness of psychometric instruments.

2. Method

The researcher, supported by the theory of the construct that the instrument aims to assess, develops m items and for each item assigns a theoretical dimension according to the theory and/or his opinion about which dimension the item assesses. Although the items and their dimensions have theoretical foundations, it is necessary to test them in order to determine if every item is indeed assessing the dimension it is supposed to. In order to fulfil such a test, the items are sent to s specialists in the construct, so that they may judge the items according to the dimension they assess. The items may be sent to at least six specialists and should be presented to them in a random order and without their theoretical dimensions, so that their judgement is not biased.
A condition for an item to be excluded from the instrument is determined based on the judgement of the specialists. This condition must exclude the items that do not belong to the universe that the instrument aims to assess, so that the not excluded items are a sample of such a universe. A possible way to proceed is to determine a Concordance Index (CI) that states that all items in which less than c% of the specialists agree on the dimension they assess must be excluded. One may also take the Content Validity Ratio (CVR), as proposed by [7], as a condition to exclude items that do not belong to the universe that the instrument aims to assess.
The method to be developed in this paper aims to determine whether all specialists have the same capability to judge the items according to their dimensions, through the analysis of their judgement about the items that were not excluded by the established condition. However, the method does not rank the specialists according to their capabilities, but only determines if all specialists have the same capability, so it is not possible to determine the specialists with low capability.
On the one hand, if there is no evidence that the capabilities of the specialists are different, their judgement is accepted and the items that are not excluded by the established condition are used in the next steps of the instrument validation process. Indeed, if all specialists have the same capability, it may happen that they are all highly capable or almost incapable of judging the items, though the proposed method is not able to differentiate between the two cases. Nevertheless, the two scenarios may be differentiated by a qualitative analysis of the specialists’ judgement, by observing if they agree with the theoretical dimension of the items and, when they do not, if there is some theory that supports their choice. Therefore, if their collective judgement is consistent with some theory, then the specialists may be regarded as all being highly capable of judging the items, given that they all have the same capability to judge them.
On the other hand, if it is determined that the specialists do not all have the same capability to judge the items, then at least one specialist is less capable to judge them than the others, which may bias the validation of the instrument. Therefore, in such a scenario, we propose two approaches in order to avoid a biased validation process. First, we propose that the specialists’ judgement be disregarded and a new group of specialists be requested to judge the items. However, this approach may be impractical, as time and resources may be too limited to repeat the cycle of specialists’ judgement more than once. Nonetheless, we propose a much more practical approach that consists in applying the proposed method to all subgroups of specialists of size s * , 6 s * < s , of the original group of specialists, and then choosing the judgement of the subgroup whose specialists all have the same capability to judge the items. This approach is presented in more detail in the application section.

3. Notation and Definitions

Let C = { C 1 , , C n } be a construct divided in n dimensions and U be the universe of all the items that assess the dimensions of C. A set I = { i 1 , , i m } of m items is developed based on the theory about C and then a subset I * I of items, that we believe to be a subset of U, is determined, by the following process.
Denote E = { e 1 , , e s } a set of s specialists and let C c ( i l ) C be the dimension that the item i l I * assesses. Let the random variables { X i l ( e j ) : i l I , e j E } , defined on ( Ω , F , P ) , be so that X i l ( e j ) = k if the specialist e j judged the item i l at the kth dimension of C (in the following, we have that l goes from 1 to m and that j goes from 1 to s).Note that if i l I * and X i l ( e j ) = c ( i l ) , then the specialist e j judged the item i l correctly. The capability of the specialist e j to judge the items is defined as
P ( e j ) = ( P i l ( e j ) : i l I * )
in which P i l ( e j ) = P { X i l ( e j ) = c ( i l ) } , i l I * and e j E . In the proposed approach, denoting | I * | the length of I * , we are interested in developing a hypothesis test to determine if P ( e j ) = p [ 0 , 1 ] | I * | , e j E , i.e., if all specialists have the same capability to judge the items.
For this purpose, let a random sample of the judgement of the specialist e j about the items of I be given by x e j = { x i 1 ( e j ) , , x i m ( e j ) } and let X be the space of all possible random samples { x e j : e j E } . Define the random sets { M i l : i l I } as
M i l = arg max k { 1 , , n } e j E 𝕝 { k } X i l ( e j )
in which 𝕝 { A } ( · ) is the indicator function of the set A. Note that M i l is the set containing the number of the dimensions in which the majority of the specialists judged the item i l I . Given a random sample { x e j : e j E } X and a subset I * I of items, the set { m i l : i l I * } , determined from the sample values { x e j : e j E } , is a random sample of { M i l : i l I * } .
The subset I * may be defined by a condition function, a function of the sample { x e j : e j E } , given by f : X P ( I ) , in which P ( · ) is the power set operator. The condition function must be such that if { m i l : i l I * } is determined from { x e j : e j E } X and I * = f ( x e j : e j E ) , then the length of m i l is one, i l I * . The CI for c > 50 and the C V R are condition functions. As the CI may be obtained from other concordance indexes, as the Content Validity Index (CVI) that is used to measure concordance when the construct is unidimensional and the task of the specialists is to judge the item’s relevance [8,9,10,11], the method developed in this paper may also be applied in other scenarios. From now on, it is supposed that the condition function may be expressed as a CI.
The condition function is based on the assumption that an item is in the universe of items that assess the construct of interest if the majority of specialists agree on the dimension it assesses. Of course, one may take a different criterion to exclude the items that do not assess the construct of interest, although our method may be applied only if the criterion can be expressed as a condition function, as it is based on the fact that M i l is a univariate random variable.
Finally, define
W i l ( e j ) = 𝕝 { M i l } X i l ( e j )
as the random variable that indicates if the specialist e j judged the item i l at the same dimension as the majority of the specialists. Given a random sample { x e j : e j E } X and a subset f ( x e j : e j E ) = I * I of items, the set { w i l ( e j ) : i l I * , e j E } , determined from the sample values { x e j : e j E } , is a random sample of { W i l ( e j ) : i l I * , e j E } .
On the one hand, whilst we observe the values of the random variables { X i l ( e j ) : i l I , e j E } , we do not know if the specialists judged the items correctly or not, as the dimension that an item really assesses (if any) is unknown. Therefore, it is not possible to differentiate the specialists by the number of items they judged correctly, for example. On the other hand, from the random variables { W i l ( e j ) : i l I , e j E } , we know the concordance of the specialists on the judgement of the items, which gives us a relative measure of their capability to judge the items. Therefore, we are able to test if all the specialists have the same capability to judge the items by applying the Cochran’s Q test, although we cannot determine the capability of each one.

4. Assumptions

The development of the items and the judgement of the specialists must satisfy two assumptions so that the method presented below may be applied:
  • Each item i l I * assesses one, and only one, dimension C c ( i l ) C .
  • The random variables { X i l ( e j ) : i l I * , e j E } are independent.
Assumption 1 establishes that the items that were not excluded by the condition function, i.e., the items in I * , are well constructed and assess only one dimension of C, while Assumption 2 imposes that the specialists judge the items independently of each other and that the judgement of a specialist about one item does not depend on his judgement about any other item. These assumptions are not strong, as it is expected that they are satisfied if the items were well constructed. Indeed, the better the condition function in determining what items are not in U, the better the quality of the items in I * . Therefore, the assumptions above are closely related to the condition function. If, in fact, I * U , then the first assumption is immediately satisfied, as there is no intersection between two dimensions of a construct, and the second assumption may also hold, as the items are well defined.

5. Mathematical Deduction

Given a random sample { x e j : e j E } X , it is not trivial to estimate the capabilities { P ( e j ) : e j E } , as the dimension that each item assesses is unknown. Examining such a random sample, it is known that the specialist e j judged the item i l at the dimension C k , but it is not possible to determine, with probability 1, if he judged such an item correctly. Therefore, the problem is, given a random sample { x e j : e j E } X , to determine random variables that allow us to test if the capability of all the specialists is the same. It will be shown that if the random variables { W i l ( e j ) : e j E } are not identically distributed i l I * , then the specialists do not all have the same capability to judge the items. Indeed, in order to test if the capability of all specialists is the same, we consider the following null hypotheses:
H 0 : 1 . P ( e j ) = ( p ( i 1 ) , , p ( i | I * | ) ) = p [ 0 , 1 ] | I * | , e j E 2 . P { X i l ( e j ) = 1 } , , P { X i l ( e j ) = n } is a permutation of p ( i l ) , p 1 ( i l ) , , p n 1 ( i l ) [ 0 , 1 ] n , p ( i l ) + k = 1 n 1 p k ( i l ) = 1 , i l I * , e j E
Of course, we are only interested in testing the first part of H 0 , that refers to the capability of the specialists, i.e., that all specialists have the same capability to judge the items. However, the second part is needed to develop a test statistic for H 0 . It will be argued that for great values of p ( i l ) , the hypothesis that is actually being tested is the first one.
The propositions below set the scenario for the nonparametric test, i.e., the Cochran’s Q test, that is used to test H 0 .
Proposition 1.
The random variables { W i l ( e j ) : i l I * } are independent e j E , but the random variables { W i l ( e j ) : e j E } are dependent i l I * .
Proof. 
On the one hand, the random variables { W i l ( e j ) : i l I * } are each, by assumption 2, function of independent random variables, therefore they are independent. On the other hand, note that e j E W i l ( e j ) c s 100 , for at least c % of the specialists must agree on the dimension that an item in I * assesses, which establishes a dependence. ☐
Proposition 2.
Under H 0 , the random variables { W i l ( e j ) : e j E } are identically distributed for all i l I * .
Proof. 
We have that
P { W i l ( e j ) = 1 } = P { X i l ( e j ) = M i l } = P { X i l ( e j ) = c ( i l ) , M i l = c ( i l ) } + P { X i l ( e j ) = M i l , M i l c ( i l ) } .
Now let X ( i l ) B i n o m i a l ( s 1 , p ( i l ) ) and X k ( i l ) B i n o m i a l ( s 1 , p k ( i l ) ) , k { 1 , , n 1 } , be independent random variables, and let f * = ( c 100 ) s , in which c is the CI. Then, under H 0 ,
P { X i l ( e j ) = c ( i l ) , M i l = c ( i l ) } = P { M i l = c ( i l ) | X i l ( e j ) = c ( i l ) } P { X i l ( e j ) = c ( i l ) } = P { X ( i l ) f * } p ( i l )
and
P { X i l ( e j ) = M i l , M i l c ( i l ) } = k = 1 k c ( i l ) n P { X i l ( e j ) = k , M i l = k } = k = 1 k c ( i l ) n P { M i l = k | X i l ( e j ) = k } P { X i l ( e j ) = k } = k = 1 n 1 P { X k ( i l ) f * } p k ( i l ) .
Hence,
P { W i l ( e j ) = 1 } = P { X ( i l ) f * } p ( i l ) + k = 1 n 1 P { X k ( i l ) f * } p k ( i l )
which does not depend on e j and the result follows. ☐
It is important to note that if all p ( i l ) are approximately 1, then P { W i l ( e j ) = 1 } p ( i l ) P { X f * } and the hypothesis that is actually being tested is the first part of H 0 . Therefore, it is reasonable to test H 0 in order to determine if all the specialists have the same capability to judge the items, as, if it is indeed true, we expect that all p ( i l ) are great and the second part of H 0 will hardly lead to the rejection of H 0 when the capability is the same.
This test may be used as a diagnostic for the content analysis of items. If H 0 is not rejected, then there is no evidence that the capabilities of the specialists are different. However, if H 0 is rejected, we do not know if it is the first or the second part (or both) of H 0 that is not being satisfied by the judgement of the specialists. Nevertheless, we may disregard their judgement in any case, as either their capability is not the same or they are the same, but some p ( i l ) are small, which led to the rejection of H 0 by its second part.

6. Hypothesis Testing

The Cochran’s Q test may be applied to the random sample { w i l ( e j ) : i l I * , e j E } determined from { x e j : e j E } as a way to test H 0 [12]. The assumptions of the Cochran’s Q test, using the notation of this paper, are as follows:
(a)
The items of I * were randomly selected from the items that form the universe U that the instrument aims to assess.
(b)
The random variables { W i l ( e j ) : i l I * , e j E } are dichotomous.
(c)
The random variables { W i l ( e j ) : i l I * } are independent.
The Cochran’s Q test is used in applications in which treatments are applied independently to blocks (subjects) and the result of each treatment application is either a success or a failure (zero or one) [13]. In our case, we have that the items may be seen as the blocks and the specialists as the treatments. What the Cochran’s Q test evaluates is if the treatments are all equally effective or, in our case, if the specialists are all equally capable of judging the items (which is equivalent to testing if the random variables { W i l ( e j ) : e j E } are identically distributed for all i l I * ). Therefore, if we reject the null hypothesis, we conclude that { W i l ( e j ) : e j E } are not identically distributed for all i l I * and, by Proposition 2, H 0 is also rejected. Thus, the hypothesis tested by the Cochran’s Q test is indeed H 0 .
The statistic of the test is calculated from Table 1, in which I * = { i 1 * , , i v * } , and may be expressed as
Q = r = 1 s s ( s 1 ) D r N s 2 l = 1 v R l ( s R l )
The exact distribution of the Q statistic may be calculated by the method presented by [14], although a large sample approximation may be used instead. If | I * | is large, then the distribution of Q is approximately χ 2 with ( s 1 ) degrees of freedom [13].
It is worth mentioning that the random variables { W i l ( e j ) : e j E } being identically distributed for all i l I * does not imply that all the specialists have the same capability to judge the items, although there is no evidence that their capabilities are different. If there is no evidence that the capabilities of the specialists to judge the items are different, their judgement may be accepted.
If it is determined that the random variables { W i l ( e j ) : e j E } are not identically distributed for all i l I * , then the judgement of the specialists is disregarded as H 0 is rejected. The items may be judged by different groups of specialists until they are judged by one in which all the specialists have the same capability to judge the items. These groups may be formed by new specialists or may be a subgroup of size s * , 6 s * < s , of the specialists for which H 0 was rejected.

7. Simulation Study

As the Cochran’s Q test is not a powerful one, i.e., its Type I error may be too great, a simulation study is conducted to estimate its power in some specific cases. The power of a statistical test is defined as the probability of H 0 being rejected when it is false and depends on the real scenario, i.e., on the real values of the parameters considered on H 0 . Therefore, the power of the Cochran’s Q test in testing H 0 depends on the real capability of each specialist to judge the items, so the simulation study considers 10 distinct scenarios and is conducted as follows.
For each scenario, we simulate 50,000 judgements of the same items by the specialists and then determine the proportion of the simulations in which H 0 was rejected at a significance, i.e., Type II error, of 5%. This proportion is regarded as an estimate for the power of the test in the considered scenario. A CI of 50% is used to determine I * in each simulation. The results of all 10 scenarios provide a wide picture of the power of the test, so we will know for which scenarios it is more powerful.
We consider in all scenarios nine specialists judging 30 items into three dimensions; this is the framework of the application in the next section. We also consider that the capability of each specialist is the same for all items, i.e., that P { X i l ( e j ) = c ( i l ) } = p j for all j { 1 , , 9 } and l { 1 , , | I * | } . Finally, we assume that P { X i l ( e j ) = k } = ( 1 p j ) / 2 for all k c ( i l ) , j { 1 , , 9 } and l { 1 , , | I * | } . A pseudocode for the simulation of each scenario is presented in Algorithm 1. The scenarios and their estimated test power are displayed in Table 2.
Algorithm 1 Pseudocode that estimates the power of the Cochran’s Q test under a given scenario from 50,000 simulated judgements.
Ensure: rejected = 0
1:
for simulation ∈ {1,...,50,000} do
2:
for j { 1 , , 9 } do
3:
  for l { 1 , , 30 } do
4:
    Simulate X i l ( e j ) from the Multinomial with parameter p j , ( 1 p j ) 2 , ( 1 p j ) 2 *
5:
  end for
6:
end for
7:
 Determine I* as the items such that at least 5 specialists agree on the dimension they assess
8:
Calculate { M i l : i l I * }
9:
Calculate { W i l ( e j ) : i l I * , j { 1 , , 9 } }
10:
   Calculate the statistic Q
11:
   Calculate the p -value of Q
12:
  if p -value < 0 . 05 then
13:
   rejected = rejected + 1
14:
end if
15:
end for
16:
return rejected/50,000
* In scenarios 1 to 8. In scenarios 9 and 10 the Multinomial has parameter ( p j , ξ j , 1 p j ξ j ) in which ξ j is simulated from a uniform distribution with range [ 0 , 1 p j ] .
On the one hand, we see in Table 2, that the power of the test is great when the majority of the specialists have the same high capability, while few specialists have a low capability, as is the case for scenarios 1, 2 and 3. This is also the case for scenario 8, when the specialists have different capability and there are specialists whose capability is very low. On the other hand, the power of the test is quite low when some of the specialists have the same high capability, and the specialists with lower capability are almost as capable as them, as is the case for scenarios 4, 5 and 6.
In scenario 7, we see that the power of the test is low when the majority of specialists have the same low capability (0.3 is this case). It happens because the specialists hardly agree on the dimension that each item assesses (as some of them are not capable) so many items are excluded by the CI and, for the items that remain, the not capable specialists agree with the highly capable ones, so it seems that they have high capability. Indeed, in scenario 7, the mean number of not excluded items is the lowest of all scenarios, so a low concordance among the specialists is evidence of the existence of specialists of low capability, given that the items were well constructed.
Finally, as pointed out in the Mathematical Deduction section, we see in scenarios 9 and 10 that the hypotheses that is actually being tested when all the specialists are highly and equally capable is the first part of H 0 , as the power of the test is close to the Type II error, which must be the case if the hypothesis is true.
The simulation study shed light on some interesting facts about the proposed method in the considered scenarios. On the one hand, if the majority of the specialists have a homogeneous high capability, and few specialists have a very low capability, or if the capability of the specialists is highly heterogeneous, then the power of the test is great. However, if the specialists all have high, but different, capability then the power of the test is low. On the other hand, if the majority of the specialists have a low capability, then a great number of items is excluded by the CI and, given that the items were well constructed, we may conclude that the specialists have low capability of judging the items, even though the power of the test is low. Finally, if only the first part of H 0 is being satisfied, and the capability of the specialists is high, then the power of the test is low and, therefore, the hypothesis that is really being tested is the first part of H 0 .

8. Application: Perception about the Evaluation of the Teaching-Learning

In this section, we apply the developed method to a real validation process, in order to analyse the content of items of an instrument that aims to assess the perception of teachers and students of higher education institutions about the teaching-learning process; this is a construct that may be divided into three dimensions: process (P), judgement (J) and teaching-learning (T).
The evaluation of teaching-learning is a process, as it must have a well defined beginning, middle and end and must have a continuous, cumulative and systematic character. Indeed, it is a systematic mechanism for gathering information over time, with well defined levels, which characterises it as a process. Also, the evaluation of teaching-learning has a judgement dimension because it must issue a judgement of value or assign a score through the analysis of educational results obtained from the information gathered over time. Finally, the evaluation of teaching-learning has a teaching-learning dimension because, as indicated by its own name, it must not only evaluate the learning, but also the teaching: it should not only evaluate what the student has learnt, but also what the teacher has taught. Therefore, the evaluation of teaching-learning is a process of data gathering, in which an individual judges or is judged according to the teaching-learning.
In order to develop an instrument to assess this construct, 30 items were developed and sent to nine specialists; they would judge the items according to the dimension that, in their opinion, each one assesses. The condition defined for excluding an item is the CI with c = 50 . The judgements of the specialists are presented in Table 3; the table for the Cochran’s Q test is displayed in Table 4 and a translation of the items, that were originally constructed in Portuguese, is presented in the Appendix A.
The statistic of the Cochran’s Q test for the data in Table 4 is Q = 13.8 and the test p-value is 0.087 , so there is no evidence that H 0 is not true, at a significance of 5%. Furthermore, as the majority of the specialists agreed on the dimension that 24 out of 30 (80%) items assess, we also do not have evidence that the capability of the specialists is low. Therefore, based on the proposed method, there is no reason to disregard the judgement of the specialists.
Nevertheless, in order to illustrate the proposed approach for the case in which H 0 is rejected, we apply the test to every subgroup of size 6 s * < 9 of specialists, which amounts to 130 subgroups, and see for which subgroups the capability of the specialists is the same. From the 130 subgroups, for 29 of them H 0 was rejected at a significance of 5%. The Q statistic and the p-value for the 10 groups with the greatest p-values are displayed in Table 5. If H 0 had been rejected by a group of nine specialists, we could then look for a subgroup of these specialists for which H 0 is not rejected and, with the help of a qualitative analysis, we could choose a subgroup of these specialists instead of disregarding their judgement as a whole and sending the items to other specialists to judge.

9. Final Remarks

The Cochran’s Q test is not a powerful one, so the method must be used with caution. The validation of a psychometric instrument is a process that comprises various procedures, therefore it must not be restricted to content analysis of items and the method developed in this paper. It is important to apply other validation techniques, both qualitative and quantitative, to the instrument to properly validate it.
The method may be improved in order to further decrease the subjectivity of the content analysis of items, especially by the development of more powerful tests and the definition of other random variables that enable the comparison between the judgement of the specialists. This paper does not exhaust the subject, but presents a nonparametric statistical approach that aims to decrease the subjectivity of a subjective process and that may applied not only to content analysis of items, but also to any statistical application that enables the definition of variables such as those of this paper.

Supplementary Materials

The R (A Language and Environment for Statistical Computing) script used in the simulation study and in the application section is available online at https://www.mdpi.com/2571-905X/1/1/1/s1.

Acknowledgments

We would like to thank Eduardo João Ribeiro dos Santos, Joaquim Armando Gomes Alves Ferreira and Maria Cristina Pereira Matos for the orientation on the Ph.D. thesis on which the instrument used in Section 8 was developed.

Author Contributions

D.M. wrote the paper and developed the statistical deduction and the simulation studies. N.R.M. developed the instrument and collected the data used in Section 8.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CIConcordance Index
CVRContent Validity Ratio
CVIContent Validity Index
PProcess
JJudgement
TTeaching-Learning

Appendix A. Items Developed in the Application Section

A translation of the constructed items, whose response is the Likert Scale, and their theoretical dimension, are presented below. The items were originally written in Portuguese.
  • The evaluation is an instrument strategically used to help in the difficulties (Process).
  • The evaluation assumes a formative role in the teaching-learning process (Teaching-Learning).
  • The proposed evaluation methods are fair and appropriate (Judgement).
  • The time available for evaluation is sufficient (Process).
  • The evaluation offers recovery strategies for students that have difficulties (Process).
  • The instructions given for the assignments subjected to evaluation are useful (Process).
  • The evaluation is a tool for punishing the student in some manner (Judgement).
  • The evaluation is an essential tool for the teaching-learning process (Process).
  • The evaluation is an essential tool for the understanding of the taught subject (Teaching-Learning).
  • The evaluation is a process that ranks the students in some manner (Judgement).
  • The evaluation is a process that, in a particular way, builds a hierarchy among the students (Judgement).
  • The evaluation is a process that follows the student during all his academic life (Process).
  • The evaluation has different meanings for who evaluate and for who is evaluated (Judgement).
  • The evaluation is used to find out where and how the teaching-learning may be improved (Process).
  • The evaluation is a tool to reward the student in some manner (Judgement).
  • The evaluation is a tool to diagnostic the teaching-learning process (Process).
  • The evaluation is a tool with technical and pedagogical characteristics (Teaching-Learning).
  • The evaluation aims to identify how much the student has learnt in the subjects (Teaching-Learning).
  • The evaluation aims to identify which paths to take to obtain knowledge (Teaching-Learning).
  • The evaluation is a systematic evidence gathering process (Process).
  • The evaluation is a process of outlining, obtaining and providing information that permits to judge decision alternatives (Process).
  • The evaluation is a process with continuous, cumulative and systematic, but not episodic, character (Process).
  • Evaluate means to provide a judgement of value or to assign a score to whom is being evaluated (Judgement).
  • The evaluation is a tool that permits to inquire to what extent the defined objectives are being achieved (Judgement).
  • The evaluation has an authoritarian and classificatory role inside the process of teaching-learning (Judgement).
  • The evaluation is an educational component that can facilitate the teaching-learning process (Teaching-Learning).
  • The teaching-learning process and the evaluation are not isolated parts of the education process (Teaching-Learning).
  • The evaluation identifies the more adequate path to make excellent teaching-learning feasible (Teaching-Learning).
  • The evaluation stimulates the acts of teaching and learning as a simultaneous process (Teaching-Learning).
  • The evaluation involves the intentional judgement of a process developed by an individual, during his learning (Judgement).

References

  1. Law, K.S.; Wong, C.S.; Mobley, W.M. Toward a taxonomy of multidimensional constructs. Acad. Manag. Rev. 1998, 23, 741–755. [Google Scholar]
  2. Haynes, S.N.; Richard, D.; Kubany, E.S. Content validity in psychological assessment: A functional approach to concepts and methods. Psychol. Assess. 1995, 7, 238–247. [Google Scholar] [CrossRef]
  3. Cronbach, L.J.; Meehl, P.E. Construct validity in psychological tests. Psychol. Bull. 1955, 52, 281–302. [Google Scholar] [CrossRef] [PubMed]
  4. Cook, D.A.; Beckman, T.J. Current concepts in validity and reliability for psychometric instruments: Theory and application. Am. J. Med. 2006, 119. [Google Scholar] [CrossRef] [PubMed]
  5. Aladwani, A.M.; Palvia, P.C. Developing and validating an instrument for measuring user-perceived web quality. Inf. Manag. 2002, 39, 467–476. [Google Scholar] [CrossRef]
  6. Engel, S.G.; Wittrock, D.A.; Crosby, R.D.; Wonderlich, S.A.; Mitchell, J.E.; Kolotkin, R.L. Development and psychometric validation of an eating disorder-specific health-related quality of life instrument. Int. J. Eat. Disord. 2006, 39, 62–71. [Google Scholar] [CrossRef] [PubMed]
  7. Lawshe, C.H. A quantitative approach to content validity. Pers. Psychol. 1975, 28, 563–575. [Google Scholar] [CrossRef]
  8. Wynd, C.A.; Schmidt, B.; Schaefer, M.A. Two quantitative approaches for estimating content validity. West. J. Nurs. Res. 2003, 25, 508–518. [Google Scholar] [CrossRef] [PubMed]
  9. Rubio, D.M.; Berg-Weger, M.; Tebb, S.S.; Lee, E.S.; Rauch, S. Objectifying content validity: Conducting a content validity study in social work research. Soc. Work Res. 2003, 27, 94–104. [Google Scholar] [CrossRef]
  10. Polit, D.F.; Beck, C.T. The content validity index: Are you sure you know what’s being reported? Critique and recommendations. Res. Nurs. Health 2006, 29, 489–497. [Google Scholar] [CrossRef] [PubMed]
  11. Polit, D.F.; Beck, C.T.; Owen, S.V. Is the CVI an acceptable indicator of content validity? Appraisal and recommendations. Res. Nurs. Health 2007, 30, 459–467. [Google Scholar] [CrossRef] [PubMed]
  12. Cochran, W.G. The comparison of percentages in matched samples. Biometrika 1950, 37, 256–266. [Google Scholar] [CrossRef] [PubMed]
  13. Conover, W. Practical Nonparametric Statistics; John Wiley & Sons: Hoboken, NJ, USA, 1998. [Google Scholar]
  14. Patil, K.D. Cochran’s Q test: Exact distribution. J. Am. Stat. Assoc. 1975, 70, 186–189. [Google Scholar] [CrossRef]
Table 1. Table of the observed random sample.
Table 1. Table of the observed random sample.
ItemSpecialistTotal
e 1 e s
i 1 * w i 1 * ( e 1 ) w i 1 * ( e s ) R 1 = e j E w i 1 * ( e j )
i v * w i v * ( e 1 ) w i v * ( e s ) R v = e j E w i v * ( e j )
Total D 1 = i l I * w i l ( e 1 ) D s = i l I * w i l ( e s ) N = i l I * e j E w i l ( e j )
Table 2. The estimated power of the test for each scenario.
Table 2. The estimated power of the test for each scenario.
ScenarioDescriptionItems *Power
1 p j = 0.9 , j 1 and p 1 = 0.45 29.90.9936
2 p j = 0.9 , j { 1 , 2 , 3 } and p 1 = p 2 = p 3 = 0.45 29.20.9999
3 p j = 0.9 , j { 1 , 2 , 3 } and p 1 = 0.45 , p 2 = 0.35 , p 3 = 0.25 29.71
4 p j = 0.9 , j 1 and p 1 = 0.8 29.90.1601
5 p j = 0.9 , j { 1 , 2 } and p 1 = p 2 = 0.8 29.90.2421
6 p j = 0.6 , j { 1 , 2 , 3 } and p 1 = p 2 = p 3 = 0.75 25.50.2342
7 p j = 0.3 , j { 1 , 2 , 3 } and p 1 = p 2 = p 3 = 0.75 14.70.2778
8 ( p 1 , , p 9 ) = ( 0.1 , 0.2 , 0.3 , 0.4 , 0.5 , 0.6 , 0.7 , 0.8 , 0.9 ) 17.30.9001
9 p j = 0.9 , j , but the second part of H 0 is not true29.90.0454
10 p j = 0.6 , j , but the second part of H 0 is not true23.10.0469
* The mean number of items not excluded by the CI.
Table 3. Judgement of the specialists about each item, i.e., the sample { x e j : e j E } .
Table 3. Judgement of the specialists about each item, i.e., the sample { x e j : e j E } .
ItemSpecialistDimension *Theoretical Dimension
123456789
1TPPTTTTPPTP
2PJTTPPTPT-T
3TTJPPPPJJ-J
4JPPPPPPJJPP
5TJTTTTJTPTP
6PTTPTTTPTTP
7JJJJJJJJJJJ
8TPPTPPTPTPP
9JPTJTTPPP-T
10JJJJJJJJJJJ
11JTJJJJJJJJJ
12PTTPPPPJPPP
13JPPTJJJJTJJ
14TTTTPPTPTTP
15JPJJJJJJJJJ
16TPTTPPJPT-P
17PPPPTTPTPPT
18JTTTPPJTJ-T
19TTTTPPTJPTT
20PPPTPPPJJPP
21JJJJJJJJPJP
22PPPPPPTTPPP
23JJJJJJJJJJJ
24TJPTPPTJT-J
25JJJJJJJJJJJ
26TPTTPPPPTPT
27PPPTPPTJPPT
28TPPJPPJPTPT
29TTPTPPPPTPT
30TJJJTTJJPJJ
* The dimension on which at least 50% of the specialists agree that the item assesses.
Table 4. Table for the Cochran’s Q test, i.e., the sample { w i l ( e j ) : i l I * , e j E } .
Table 4. Table for the Cochran’s Q test, i.e., the sample { w i l ( e j ) : i l I * , e j E } .
ItemSpecialistTotal
123456789
11001111005
40111111006
51011110106
60110111016
71111111119
80110110105
101111111119
111011111118
121001111016
131000111105
141111001016
151011111118
171111001016
191111001005
201110111006
211111111108
221111110017
231111111119
251111111119
260100111105
271110110016
280110110105
290010111105
300111001105
Total171720162020191412155
Table 5. Result of the Cochran’s Q test for the subgroups of specialists with the greatest p-values.
Table 5. Result of the Cochran’s Q test for the subgroups of specialists with the greatest p-values.
SpecialistsQp-Value
(1,3,4,5,6,7)1.0640.957
(1,2,3,4,5,7)1.8180.874
(1,2,3,4,6,7)1.8180.874
(1,2,3,4,5,6)2.3400.800
(1,2,3,5,7,9)2.3910.793
(1,2,3,6,7,9)2.3910.793
(2,3,4,5,6,7)2.5190.774
(1,2,4,5,6,9)2.5280.772
(1,3,4,5,6,9)2.6190.758
(1,3,4,5,6,7,9)3.6120.729

Share and Cite

MDPI and ACS Style

Marcondes, D.; Marcondes, N.R. A Nonparametric Statistical Approach to Content Analysis of Items. Stats 2018, 1, 1-13. https://doi.org/10.3390/stats1010001

AMA Style

Marcondes D, Marcondes NR. A Nonparametric Statistical Approach to Content Analysis of Items. Stats. 2018; 1(1):1-13. https://doi.org/10.3390/stats1010001

Chicago/Turabian Style

Marcondes, Diego, and Nilton Rogerio Marcondes. 2018. "A Nonparametric Statistical Approach to Content Analysis of Items" Stats 1, no. 1: 1-13. https://doi.org/10.3390/stats1010001

Article Metrics

Back to TopTop