Next Article in Journal
Hybrid Newton-like Inverse Free Algorithms for Solving Nonlinear Equations
Previous Article in Journal
Resource Allocation of Cooperative Alternatives Using the Analytic Hierarchy Process and Analytic Network Process with Shapley Values
Previous Article in Special Issue
Multi-Augmentation-Based Contrastive Learning for Semi-Supervised Learning
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Smooth Information Criterion for Regularized Estimation of Item Response Models

by
Alexander Robitzsch
1,2
1
IPN—Leibniz Institute for Science and Mathematics Education, Olshausenstraße 62, 24118 Kiel, Germany
2
Centre for International Student Assessment (ZIB), Olshausenstraße 62, 24118 Kiel, Germany
Algorithms 2024, 17(4), 153; https://doi.org/10.3390/a17040153
Submission received: 15 March 2024 / Revised: 2 April 2024 / Accepted: 3 April 2024 / Published: 6 April 2024
(This article belongs to the Special Issue Supervised and Unsupervised Classification Algorithms (2nd Edition))

Abstract

:
Item response theory (IRT) models are frequently used to analyze multivariate categorical data from questionnaires or cognitive test data. In order to reduce the model complexity in item response models, regularized estimation is now widely applied, adding a nondifferentiable penalty function like the LASSO or the SCAD penalty to the log-likelihood function in the optimization function. In most applications, regularized estimation repeatedly estimates the IRT model on a grid of regularization parameters λ . The final model is selected for the parameter that minimizes the Akaike or Bayesian information criterion (AIC or BIC). In recent work, it has been proposed to directly minimize a smooth approximation of the AIC or the BIC for regularized estimation. This approach circumvents the repeated estimation of the IRT model. To this end, the computation time is substantially reduced. The adequacy of the new approach is demonstrated by three simulation studies focusing on regularized estimation for IRT models with differential item functioning, multidimensional IRT models with cross-loadings, and the mixed Rasch/two-parameter logistic IRT model. It was found from the simulation studies that the computationally less demanding direct optimization based on the smooth variants of AIC and BIC had comparable or improved performance compared to the ordinarily employed repeated regularized estimation based on AIC or BIC.

1. Introduction

Item response theory (IRT; [1,2,3,4,5]) modeling is a class of statistical models that analyze discrete multivariate data. In these models, a vector X = ( X 1 , , X I ) of I discrete variables X i ( i = 1 , , I ; also referred to as items) is summarized by a unidimensional or multidimensional factor variable  θ . In this article, we confine ourselves to dichotomous random variables X i { 0 , 1 } .
The multivariate distribution for the vector X { 0 , 1 } I in the IRT model is defined as
P ( X = x ; γ ) = i = 1 I P ( X i = x i | θ ; γ i ) f ( θ ; β ) d θ ,
where γ = ( γ 1 , , γ I , β ) is the vector of model parameters. The vector γ i contains item parameters of item i, while β parametrizes the density f of the factor variable θ . Note that (1) includes a local independence assumption. That is, the items X i are conditionally independent given the factor variable θ . The function θ P ( X i = x i | θ ; γ i ) is also referred to as the item response function (IRF; [6,7,8]). The two-parameter logistic (2PL) model [9] uses the IRF θ Ψ ( a i θ b i ) , where Ψ denotes the logistic distribution function.
Now, assume that N independent replications of X are available. The parameter vector γ from these observations x 1 , , x N can be estimated by minimizing the negative log-likelihood function
l ( γ ) = n = 1 N log P ( X = x n ; γ ) ,
where the parameter vector γ = ( γ 1 , , γ H ) contains H components that have to be estimated.
In various applications, the IRT model (1) is not identified or includes too many parameters, making the interpretation difficult. To this end, some sparsity structure [10] on model parameters γ is imposed. Regularized estimation as a machine learning technique is employed in IRT models to make estimation feasible [11,12,13]. More formally, sparsity structure on γ is imposed by replacing the negative log-likelihood function with a negative regularized log-likelihood function
l reg ( γ ; λ ) = l ( γ ) + N h = 1 H ι h P ( γ h , λ ) ,
where ι h is an indicator variable for the parameter γ h that takes values 0 or 1. The indicator ι h equals 1 if γ h is regularized (i.e., the sparsity structure assumption applies to this parameter), while it is 0 if γ h should not be regularized. Let H 1 = h = 1 H ι h be the number of regularized parameters and H 0 = H H 1 is the number of nonregularized model parameters. The regularized negative log-likelihood function l reg defined in (3) includes a penalty function P that decodes the assumptions about sparsity. For a scalar parameter x, the least absolute shrinkage and selection operator (LASSO; [14]) penalty is a popular penalty function used in regularization, and it is defined as
P LASSO ( x , λ ) = λ | x | ,
where λ is a nonnegative regularization parameter that controls the extent of sparsity in the obtained parameter estimate. It is well-known that the LASSO penalty introduces bias in estimated parameters. To circumvent this issue, the smoothly clipped absolute deviation (SCAD; [15]) penalty has been proposed.
P SCAD ( x , λ ) = λ | x | if | x | < λ ( x 2 2 a λ | x | 2 + λ 2 ) ( 2 ( a 1 ) ) 1 if λ | x | a λ ( a + 1 ) λ 2 if | x | > a λ
In many studies, the recommended value of a = 3.7 (see [15]) has been adopted (e.g., [10,16]). Note that P SCAD has the property of the LASSO penalty around zero, but has zero derivatives for x values that strongly differ from zero.
A parameter estimate γ ^ of the regularized IRT model is defined as an estimator defined as the minimizer of l reg
γ ^ ( λ ) = arg min γ l reg ( γ ; λ ) .
Note that the penalty function P involves a fixed tuning parameter  λ . Hence, the parameter estimate γ ^ ( λ ) depends on λ . A crucial issue of the LASSO and the SCAD penalty functions is that they are nondifferentiable functions because the function x | x | is nondifferentiable. Hence, particular estimation techniques for nondifferentiable optimization problems must be applied [14,17,18]. As an alternative, the nondifferentiable optimization function can be replaced by a differentiable approximation [19,20,21,22]. For example, the absolute value function x | x | in the SCAD penalty can be replaced with x x 2 + ε for a sufficiently small ε such as ε = 0.001 . Using differentiable approximations has the advantage that ordinary gradient-based optimizers can be utilized.
In practice, the estimation of the regularized IRT model is carried out on a grid of T values of λ in a grid Λ = { λ 1 , , λ T } . For each value of the tuning parameter  λ t , a parameter estimate γ ^ ( λ t ) is obtained. A final parameter estimate γ ^ is obtained by minimizing an information criterion
I C ( γ ^ ( λ ) ) = 2 l ( γ ^ ( λ ) ) + K N H 0 + h = 1 H ι h 1 ( γ ^ h ( λ ) 0 ) ,
where the factor K N is chosen as K N = 2 for the Akaike information criterion (AIC; [23]) and K N = log N for the Bayesian information criterion (BIC; [24]) (see [25]).
If the regularized likelihood function is evaluated with differentiable approximations, there are no regularized parameters that exactly equal zero (in contrast to special-purpose optimizers for regularized estimation; [17]). Hence, estimated parameters γ h are counted as zero if they do not exceed a fixed threshold τ (such as 0.001, 0.01, or 0.02) in its absolute value. Hence, the approximated information criterion is computed as
I C γ ^ ( λ ) = 2 l ( γ ^ ( λ ) ) + K N H 0 + h = 1 H ι h 1 ( | γ ^ h ( λ ) | > τ ) .
The final estimator of γ is defined as
γ ^ I C = γ ^ ( λ ^ opt ) with λ ^ opt = arg min λ Λ I C γ ^ ( λ ) .
Depending on the chosen value of K N , the regularized parameter estimate can be based on the AIC and BIC.
The ordinary estimation approach to regularized estimation described above has the computational disadvantage that it requires a sequential fitting of models on the grid Λ of the regularization parameter λ . This approach is referred to as an indirect optimization approach because it first minimizes a criterion function (i.e., the regularized likelihood function) with respect to γ for a fixed value of λ and optimizes a second criterion (i.e., the AIC or BIC) in the second step. O’Neill and Burke [26] proposed an estimation approach to regularized estimation that directly minimizes a smooth version of the BIC (i.e., smooth Bayesian information criterion, SBIC) for regression models. This direct estimation approach has been successfully implemented for structural equation models [21,27]. For these models, the optimization based on SBIC had similar, if not better, performance than the ordinary estimation of regularized models based on the AIC and BIC. In this paper, we explore whether the smooth information criteria SBIC and the smooth Akaike information criterion (SAIC) also hold promise for various applications in IRT models. Using a computationally cheaper alternative for regularized estimation is probably even more important for IRT models than for structural equation models because IRT models are more difficult to estimate and more computationally demanding. To the best of our knowledge, this is the first attempt at using smoothed information criteria in IRT models.
The rest of this paper is structured as follows. The optimization using smooth information criteria is outlined in Section 2. Afterward, three applications of regularized IRT models are investigated in three simulation studies. Section 3 presents Simulation Study 1, which studies regularized estimation for differential item functioning. Section 4 presents Simulation Study 2, which investigates the regularized estimation of multidimensional IRT models. The last Simulation Study 3 in Section 5 is devoted to regularized estimation of the mixed Rasch/2PL model. Finally, this study closes with a discussion in Section 6.

2. Smooth Information Criterion

In theory, a parameter estimate γ ^ for γ of the IRT model may be obtained by directly minimizing an information criterion
γ ^ = arg min γ 2 l ( γ ) + K N H 0 + h = 1 H ι h 1 ( γ h 0 ) .
The optimization function in (10) can be interpreted as a regularized log-likelihood function with an L 0 penalty [28,29]. Obviously, the indicator function 1 in (10) counts the number of regularized parameters that differ from zero. Researchers O’Neill and Burke [26] proposed substituting the indicator function with a suitable differentiable approximation N ε . To this end, a smooth information criterion, such as the SAIC and the SBIC, is obtained. In more detail, the differentiable approximation N ε for 1 is defined as
N ε ( x ) = x 2 x 2 + ε ,
where ε > 0 is a sufficiently small tuning parameter, such as ε = 0.001 . The function N ε takes values close to zero for x arguments close to 0 and approaches 1 if | x | moves away from 0. A smoothed information criterion S I C ( γ ) (abbreviated as SIC) can be defined as
S I C ( γ ) = 2 l ( γ ) + K N H 0 + h = 1 H ι h N ε ( γ h ) .
We obtain the SAIC for the choices of K N in (12) of K N = 2 and the SBIC for K N = log ( N ) .
Hence, the minimization problem (10) can be replaced by
γ ^ = arg min γ S I C ( γ ) = arg min γ 2 l ( γ ) + K N H 0 + h = 1 H ι h N ε ( γ h ) .
The optimization function in (13) directly minimizes a smoothed version of the information criterion.

3. Simulation Study 1: Differential Item Functioning

In the first Simulation Study 1, the assessment of differential item functioning (DIF; [30,31,32]) is considered as an example. DIF occurs in datasets with multiple groups if item parameters are not invariant (i.e., they are not equal) across groups. In this study, the case of two groups in the unidimensional 2PL model is treated. The IRF is given by
P ( X i = 1 | G = g , θ ) = Ψ a i θ b i δ i 1 ( G = 2 ) for g = 1 , 2 ,
where δ i indicates the DIF in item intercepts, which is also referred to as uniform DIF. The item parameters of item X i are given by γ i = ( a i , b i , δ i ) . The mean and the standard deviation of θ in the first group are fixed for identification reasons to 0 and 1, respectively. Then, the mean μ 2 and the standard deviation σ 2 of θ in the second group can be estimated.
It has been pointed out that additional assumptions about DIF effects δ i must be imposed for model identification [33,34,35]. Assuming a sparsity structure on the DIF effects may be one plausible option. To this end, DIF effects δ i ( i = 1 , , I ) are regularized in the optimization based on the regularized log-likelihood function (3) or the minimization of the SIC (13). Regularized estimation of DIF in IRT models has been widely discussed in the literature [36,37,38,39,40,41,42].

3.1. Method

In this simulation study, we use a data-generating model (DGM) similar to the one used in the simulation study in [38]. The factor variable θ was assumed to be univariate normally distributed. We fixed the mean μ 1 and the standard deviation σ 1 of the factor variable θ in the first group to 0 and 1, respectively. The factor variable θ had a mean μ 2 of 0.5 and a standard deviation σ 2 of 0.8 in the second group. In total, I = 25 items were used in this simulation study.
We now describe the item parameters used for the IRF defined in (14). The common item discriminations  a i of the 25 items were chosen as 1.3, 1.4, 1.5, 1.7, 1.6, 1.3, 1.4, 1.5, 1.7, 1.6, 1.3, 1.4, 1.5, 1.7, 1.6, 1.3, 1.4, 1.5, 1.7, 1.6, 1.3, 1.4, 1.5, 1.7, and 1.6. The item difficulties  b i were chosen as −0.8, 0.4, 1.2, 2.0, −2.0, −0.8, 0.4, 1.2, 2.0, −2.0, −0.8, 0.4, 1.2, 2.0, −2.0, −0.8, 0.4, 1.2, 2.0, −2.0, −0.8, 0.4, 1.2, 2.0, and −2.0. The DIF effects δ i were zero for the first 15 items. Items 16 to 25 had non-zero DIF effects.
In the condition of small DIF effects (see [38]), we chose δ i values of −0.60, 0.60, −0.65, 0.70, 0.65, −0.70, 0.60, −0.65, 0.70, and −0.65 for Items 16 to 25. In the condition of large DIF effects, we multiplied these effects by 2. These two conditions are referred to as balanced DIF conditions because the DIF effects δ i average to zero. In line with other studies, we also considered unbalanced DIF [43], in which we took absolute DIF effects in the small DIF and large DIF conditions. In the unbalanced DIF conditions, all DIF effects δ i were assumed positive and did not average to zero. The item parameters can also be found at https://osf.io/ykew6 (accessed on 2 April 2024).
Moreover, we varied the sample size N in this simulation study by 500, 1000, and 2000. There were N / 2 subjects in each of the two groups.
The regularized 2PL model with DIF was estimated with the regularized likelihood function using the SCAD penalty on a nonequidistant grid of 37 λ values between 0.0001 and 1 (see the R simulation code at https://osf.io/ykew6; accessed on 2 April 2024). We approximated the nondifferentiable SCAD penalty function by its differentiable approximating function using the tuning ε = 0.001 . We saved parameter estimates that minimized AIC and BIC. Item parameters that did not exceed the threshold τ = 0.02 in its absolute value were regularized. In the direct minimization of SAIC and SBIC, we tried the values 0.01, 0.001, and 0.0001 of the tuning parameters ε . It was found that ε = 0.001 performed best, which is the reason why we only reported this solution.
As the outcome of the simulation study, we studied (average) absolute bias and (average) root mean square error (RMSE) of model parameter estimates as well as type-I error rates and power rates. Absolute bias and RMSE were computed for estimates of distribution parameters μ 2 and σ 2 . Moreover, absolute bias and RMSE were computed for all estimates of DIF effects δ i . Formally, let γ h be the hth parameter ( h = 1 , , H ) in the model parameter vector γ . Let γ ^ h r be the parameter estimate of γ h in replication r ( r = 1 , , R ). The absolute bias (abias) of the parameter estimate γ ^ h was computed as
abias ( γ ^ h ) = 1 R r = 1 R γ ^ h r γ h .
The RMSE was computed as
RMSE ( γ ^ h ) = 1 R r = 1 R ( γ ^ h r γ h ) 2 .
The average absolute bias and average RMSE were computed for DIF effects with true values of 0 (i.e., DIF effects for Items 1 to 15; non-DIF items) and for DIF effects different from 0 (i.e., DIF effects for Items 16 to 25; DIF items). The (average) type-I error rates was assessed for non-DIF items as the proportion of events in which an estimated DIF effect differed from zero (i.e., it exceeded the threshold τ = 0.02 in its absolute value). The (average) power rates were determined for DIF items accordingly. More formally, the type-I error rate or power rate (abbreviated as “rate” in (17)) was determined by
rate ( γ ^ h ) = 100 · 1 R r = 1 R 1 ( | γ ^ h r | > τ ) .
Absolute bias values smaller than 0.03 were classified as acceptable in this simulation study. Moreover, type-I error rates smaller than 10.0 and power rates larger than 80.0 were seen as satisfactory.
In total, R = 750 replications were conducted in each of the 2 (small vs. large DIF) × 2 (balanced vs. unbalanced DIF) × 3 (sample size) = 12 cells of the simulation study. The entire simulation study was conducted with the R [44] statistical software. The estimation of the regularized IRT model was carried out using the sirt::xxirt() function in the R package sirt [45]. Replication material for the simulation study can be found at https://osf.io/ykew6 (accessed on 2 April 2024).

3.2. Results

Table 1 displays the average absolute bias and the average RMSE of model parameters as a function of the extent of DIF and sample size N for balanced and unbalanced DIF. It turned out that the mean μ 2 and the standard deviation σ 2 of the second group were unbiasedly estimated in the balanced DIF condition. Moreover, while DIF effects for non-DIF items were unbiasedly estimated, DIF effects were biased for moderate sample sizes (i.e., for N = 500 and 1000). In general, there was a similar behavior of regularized estimation based on AIC and BIC compared to its smooth competitors SAIC and SBIC. However, smooth information criteria had some advantages in smaller samples with respect to the RMSE. Note that SAIC was the frontrunner in all balanced DIF conditions regarding the RMSE of the estimate of μ 2 .
In the unbalanced DIF condition, estimated group means and DIF effects were generally biased. However, the bias decreased with increased sample size and was smaller with large instead of small DIF effects. SBIC was the frontrunner on five out of six conditions for estimates of μ 2 with respect to the RMSE. Only for N = 500 and small DIF, SAIC outperformed the other estimators.
Table 2 presents average type-I error and power rates for DIF effects of non-DIF and DIF items. It is evident that AIC and SAIC had inflated type-I error rates. Moreover, BIC and SBIC had acceptable type-I error rates. However, SBIC had an inflated type-I error rate for N = 500 in the unbalanced DIF condition with a small DIF. Overall, the power rates of regularized estimators AIC and BIC performed similarly to their smooth alternatives SAIC and SBIC. However, SBIC slightly outperformed BIC in terms of power rates.

4. Simulation Study 2: Multidimensional Logistic Item Response Model

In this Simulation Study 2, the multidimensional logistic IRT model [46] with cross-loadings is studied. That is, each item X i is allocated to a primary dimension θ d . However, it could be that this item also loads on other dimensions than the primary dimension (i.e., the target factor variable). Formally, the IRF of the multidimensional logistic IRT model is given by
P ( X i = 1 | θ ) = Ψ d = 1 D a i d θ d b i ,
where θ = ( θ 1 , , θ D ) . All item discriminations a i d are regularized in the estimation, except those that load on the primary dimension. The means and standard deviations of factor variables θ d are fixed at 0 and 1 for identification reasons, respectively. The correlations between the dimensions can be estimated.
The regularized estimation of this model has been discussed in Refs. [47,48,49]. To ensure the identifiability of the model parameter, a sparse loading structure for item discriminations  a i d is imposed. That is, most item discriminations are (approximately) zero in the DGM. Only a few loadings are allowed to differ from 0. Notably, regularized estimation of factor models can be regarded as an alternative to rotation methods in exploratory factor analysis [50,51].

4.1. Method

In this simulation study, we used a DGM with I = 20 items and D = 2 factor variables  θ 1 and  θ 2 . The first 10 items loaded on the first dimension, while Items 11 to 20 loaded on the second dimension. The factor variable ( θ 1 , θ 2 ) was bivariate normally distributed with standardized normally distributed components and a fixed correlation ρ of 0.5.
Moreover, we specified five cross-loadings. Items 2 and 6 had a cross-loading of size δ on the second dimension, while Items 13, 16, and 19 had a cross-loading of size δ on the first dimension. The DGM is visualized in Figure 1. In more detail, the loading matrix A that contains the item discriminations a i d (see (18)) is given by
A = 0.6 0 0.8 δ 1.0 0 1.4 0 1.2 0 0.6 δ 0.8 0 1.0 0 1.4 0 1.2 0 0 0.6 0 0.8 δ 1.0 0 1.4 0 1.2 δ 0.6 0 0.8 0 1.0 δ 1.4 0 1.2 .
The size of the cross-loading δ was chosen as 0.3, indicating a small cross-loading), or 0.5, indicating a large cross-loading. The item difficulties b i (see (18)) of the 20 items were −0.8, 0.4, 1.2, 2.0, −2.0, −0.8, 0.4, 1.2, 2.0, −2.0, −0.8, 0.4, 1.2, 2.0, −2.0, −0.8, 0.4, 1.2, 2.0, and −2.0. The item parameters can also be found at https://osf.io/ykew6 (accessed on 2 April 2024).
We varied the sample size N as 500, 1000, and 2000, which may be interpreted as a small, moderate, and large sample size.
Like in Simulation Study 1, we compared the performance of regularized estimation based on AIC and BIC with the smooth alternatives SAIC and SBIC. A nonequidistant grid of 37 λ values between 0.0001 and 1 was chosen (see the R simulation code at https://osf.io/ykew6; accessed on 2 April 2024). The optimization functions were specified with the same tuning parameters for differentiable approximations as in Simulation Study 1 (see Section 3.1). (Average) absolute bias and (average) RMSE of model parameters, as well as type-I error rates and power rates for cross-loadings, were assessed for the four estimation methods.
In total, R = 750 replications were conducted in each of the 2 (small vs. large cross-loadings) × 3 (sample size) = 6 cells of the simulation study. The whole simulation study was conducted using the statistical software R [44]. The estimation of the regularized multidimensional logistic IRT model was carried out using the sirt::xxirt() function in the R package sirt [45]. Replication material for this simulation study can also be found at https://osf.io/ykew6 (accessed on 2 April 2024).

4.2. Results

Table 3 reports the (average) absolute bias and (average) RMSE of estimated model parameters. It turned out that the factor correlation ρ was biased for small and moderate sample sizes of N = 500 and N = 1000 . The bias was reduced with larger cross-loadings in large sample sizes. However, a notable bias was even present for a large sample size N = 2000 if the BIC or SBIC was used. However, AIC and SAIC outperformed the other criteria for estimates of ρ with respect to bias and RMSE. Interestingly, the RMSE of SAIC was substantially smaller compared to AIC for the factor correlation ρ , as well as for true zero cross-loadings (i.e., rows “ C L = 0 ” in Table 3) and non-zero cross-loadings (i.e., rows “ C L 0 ” in Table 3).
Table 4 shows type-I error rates and power rates of estimated cross-loadings. It is evident that AIC had inflated type-I error rates, while type-I error rates of SAIC, BIC, and SBIC were acceptable. Importantly, there were low power rates for BIC and SBIC, in particular for small cross-loadings. The SAIC estimation method may be preferred if the goal is detecting non-zero cross-loadings.

5. Simulation Study 3: Mixed Rasch/2PL Model

Recently, a mixed Rasch/2PL model [52] (see also [53]) received some attention. The idea of this unidimensional IRT model is to find items that conform to the Rasch model [54], while there can be a subset of items that follow the more complex 2PL model [9]. The IRF of this model is given by
P ( X i = 1 | θ ) = Ψ exp ( α i ) θ b i .
Note that the IRF in (20) is just a reparametrized 2PL model with item discriminations a i = exp ( α i ) . Hence, α i = log ( a i ) are the logarithms of item discriminations a i . The case α i = 0 corresponds to the Rasch model because a i = exp ( α i ) = 1 , while α i 0 results in item discriminations a i different from 1. The mean of the factor variable θ is fixed to 0, while the standard deviation σ should be estimated.
In order to achieve identifiability of the model parameters, a sparsity structure of the logarithms of item discriminations α i is imposed. Hence, the majority of items is assumed to follow the Rasch model. Again, the sparsity structure is directly implemented in a regularized estimation of the mixed Rasch/2PL model.

5.1. Method

In this simulation study, we used I = 20 items for the DGM of the mixed Rasch/2PL model. The factor variable θ was assumed to be normally distributed with a zero mean and a standard deviation σ = 1.2 . The item difficulties b i (see the IRF in (20)) of the 20 items were chosen as −0.8, 0.4, 1.2, 2.0, −2.0, −0.8, 0.4, 1.2, 2.0, −2.0, −0.8, 0.4, 1.2, 2.0, −2.0, −0.8, 0.4, 1.2, 2.0, and −2.0. The first 14 items followed the Rasch model (i.e., α i = 0 for i = 1 , , 15 ). Items 15 to 20 followed the 2PL model and had α i values that equaled δ , δ , δ , δ , δ , and δ . The size of δ controlled the deviation from the Rasch model. We either chose δ as log ( 1.4 ) = 0.336 and log ( 2 ) = 0.693 , indicating small and large deviations from the Rasch model. Moreover, we manipulated the direction of the deviation from the Rasch model. While the previously described conditions had α i that canceled out on average and resulted in a balanced deviation from the Rasch model (i.e., there was an equal number of items that are smaller and larger than 1, respectively), we also specified an unbalanced deviation from the Rasch model in which Items 15 to 20 all had the value δ . In this condition, we also studied small (i.e., δ = 0.336 ) and large (i.e., δ = 0.693 ) deviations from the Rasch model. Hence, in the case of unbalanced deviations from the Rasch model, items had either discriminations of 1 or larger than 1. The item parameters can also be found at https://osf.io/ykew6 (accessed on 2 April 2024).
Like in the other two simulation studies, we varied the sample size N as 500, 1000, and 2000.
Again, like in Simulation Study 1 and Simulation Study 2, we compared the performance of regularized estimation based on AIC and BIC with the smooth alternatives SAIC and SBIC. A nonequidistant grid of 33 λ values between 0.001 and 1 was chosen (see the R simulation code at https://osf.io/ykew6; accessed on 2 April 2024). The optimization functions were specified with the same tuning parameters for differentiable approximations as in Simulation Study 1 (see Section 3.1). (Average) absolute bias and (average) RMSE of model parameters σ and α i ( i = 1 , , I ), as well as type-I error rates and power rates for logarithms of item discriminations, were assessed.
Overall, R = 750 replications were conducted in each of the 2 (small vs. large deviations) × 2 (balanced vs. unbalanced deviations) × 3 (sample size) = 12 cells of the simulation study. This simulation study was also executed using the statistical software R [44]. Like in the other two simulation studies, the regularized multidimensional logistic IRT model was estimated using the sirt::xxirt() function in the R package sirt [45]. Replication material for this simulation study can also be found at https://osf.io/ykew6 (accessed on 2 April 2024).

5.2. Results

Table 5 contains the (average) absolute bias and (average) RMSE for the estimated model parameters. Notably, there was a different pattern of findings in the conditions of balanced and unbalanced deviations from the Rasch model. In general, SAIC performed well for the estimation of σ , except for small balanced deviations from the Rasch model with a sample size of N = 500 . In most of the conditions, estimation based on SBIC performed similarly, if not better, than BIC for the estimation of σ in terms of RMSE.
Table 6 displays type-I error rates and power rates for estimated logarithms of item discriminations. In contrast to the estimation based on the AIC, SAIC had acceptable type-I error rates. Moreover, power rates for detecting deviations from the Rasch model were much higher for SAIC then BIC or SBIC.

6. Discussion

In this article, we compared the ordinarily employed indirect regularized estimation based on a grid of regularization parameters λ with a subsequent discrete minimization of AIC and BIC with a direct minimization of smooth information criteria SAIC and SBIC [26] for the estimation of regularized item response models. It turned out that the direct SIC-based estimation methods resulted in comparable, in many cases, or better performance than the indirect regularization estimation methods based on AIC and BIC. This is remarkable because SIC-based minimization is computationally much simpler, and ordinary gradient-based optimization routines can be utilized.
We studied the performance of SAIC and SBIC in three simulation studies that focus on differential item functioning, (semi-)exploratory multidimensional IRT models, and model choice between the Rasch model and the 2PL model. These three cases frequently appear in applications of regularized IRT models, which is why we chose these settings for our work.
In this article, we confined ourselves to analyzing dichotomous item responses and continuous factor variables. Future research could investigate the application of these techniques to polytomous item response, count item response data [55], or cognitive diagnostic models that involve multivariate binary factor variables [56]. More generally, smooth information criteria can be used in all modeling approaches that involve regularized estimation. In the field of econometrics or social science, possible applications could be (generalized) linear regression models [57], regularized panel models [58], or regularized estimation for analyzing heterogeneous treatment effects [59].
Notably, we did not investigate the estimation of standard errors in this article. Future research may investigate this with an application of the Huber–White variance estimation formula [60,61] applied to the subset of parameters that resulted in non-zero values [62].
Finally, two different targets in the analysis of item response models should be distinguished in regularized estimation. First, the selection or detection of non-zero effects like cross-loadings or DIF effects may be the focus. For this goal, model selection based on information criteria can prove helpful in order to control type-I error rates. Second, if the focus lies on structural parameters (such as group means or factor correlations), choosing a parsimonious model that tries to penalize the number of estimated parameters, like in information criteria, may not be beneficial in terms of bias and variability of structural parameters [21]. It can be advantageous to use a sufficiently small regularization parameter λ to ensure the empirical identifiability of the model but not to focus on effect selection if structural parameters are of interest [63]. In this sense, sparsity in effects is imposed in a defensive way.

Funding

This research received no external funding.

Data Availability Statement

Supplementary material for the simulation studies can be found at https://osf.io/ykew6 (accessed on 2 April 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
2PLtwo-parameter logistic
AICAkaike information criterion
BICBayesian information criterion
DGMdata-generating model
DIFdifferential item functioning
IRFitem response function
IRTitem response theory
LASSOleast absolute shrinkage and selection operator
MLmaximum likelihood
RMSEroot mean square error
SAICsmooth Akaike information criterion
SBICsmooth Bayesian information criterion
SCADsmoothly clipped absolute deviation
SICsmooth information criterion

References

  1. Baker, F.B.; Kim, S.H. Item Response Theory: Parameter Estimation Techniques; CRC Press: Boca Raton, FL, USA, 2004. [Google Scholar] [CrossRef]
  2. Bock, R.D.; Gibbons, R.D. Item Response Theory; Wiley: Hoboken, NJ, USA, 2021. [Google Scholar] [CrossRef]
  3. Chen, Y.; Li, X.; Liu, J.; Ying, Z. Item response theory—A statistical framework for educational and psychological measurement. arXiv 2021, arXiv:2108.08604. [Google Scholar]
  4. van der Linden, W.J.; Hambleton, R.K. (Eds.) Handbook of Modern Item Response Theory; Springer: New York, NY, USA, 1997. [Google Scholar] [CrossRef]
  5. Yen, W.M.; Fitzpatrick, A.R. Item response theory. In Educational Measurement; Brennan, R.L., Ed.; Praeger Publishers: Westport, CT, USA, 2006; pp. 111–154. [Google Scholar]
  6. van der Linden, W.J. Unidimensional logistic response models. In Handbook of Item Response Theory, Volume 1: Models; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 11–30. [Google Scholar]
  7. Reckase, M.D. Logistic multidimensional models. In Handbook of Item Response Theory, Volume 1: Models; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 189–210. [Google Scholar]
  8. Swaminathan, H.; Rogers, H.J. Normal-ogive multidimensional models. In Handbook of Item Response Theory, Volume 1: Models; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 167–187. [Google Scholar]
  9. Birnbaum, A. Some latent trait models and their use in inferring an examinee’s ability. In Statistical Theories of Mental Test Scores; Lord, F.M., Novick, M.R., Eds.; MIT Press: Reading, MA, USA, 1968; pp. 397–479. [Google Scholar]
  10. Fan, J.; Li, R.; Zhang, C.H.; Zou, H. Statistical Foundations of Data Science; Chapman and Hall/CRC: Boca Raton, FL, USA, 2020. [Google Scholar] [CrossRef]
  11. Goretzko, D.; Bühner, M. Note: Machine learning modeling and optimization techniques in psychological assessment. Psychol. Test Assess. Model. 2022, 64, 3–21. Available online: https://tinyurl.com/bdehjkzz (accessed on 2 April 2024).
  12. Finch, H. Applied Regularization Methods for the Social Sciences; Chapman and Hall/CRC: Boca Raton, FL, USA, 2022. [Google Scholar] [CrossRef]
  13. Jacobucci, R.; Grimm, K.J.; Zhang, Z. Machine Learning for Social and Behavioral Research; Guilford Publications: New York, NY, USA, 2023. [Google Scholar]
  14. Hastie, T.; Tibshirani, R.; Wainwright, M. Statistical Learning with Sparsity: The Lasso and Generalizations; CRC Press: Boca Raton, FL, USA, 2015. [Google Scholar] [CrossRef]
  15. Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  16. Zhang, H.; Li, S.J.; Zhang, H.; Yang, Z.Y.; Ren, Y.Q.; Xia, L.Y.; Liang, Y. Meta-analysis based on nonconvex regularization. Sci. Rep. 2020, 10, 5755. [Google Scholar] [CrossRef] [PubMed]
  17. Orzek, J.H.; Arnold, M.; Voelkle, M.C. Striving for sparsity: On exact and approximate solutions in regularized structural equation models. Struct. Equ. Model. 2023, 30, 956–973. [Google Scholar] [CrossRef]
  18. Zhang, S.; Chen, Y. Computation for latent variable model estimation: A unified stochastic proximal framework. Psychometrika 2022, 87, 1473–1502. [Google Scholar] [CrossRef] [PubMed]
  19. Battauz, M. Regularized estimation of the nominal response model. Multivar. Behav. Res. 2020, 55, 811–824. [Google Scholar] [CrossRef] [PubMed]
  20. Oelker, M.R.; Tutz, G. A uniform framework for the combination of penalties in generalized structured models. Adv. Data Anal. Classif. 2017, 11, 97–120. [Google Scholar] [CrossRef]
  21. Robitzsch, A. Implementation aspects in regularized structural equation models. Algorithms 2023, 16, 446. [Google Scholar] [CrossRef]
  22. Robitzsch, A. Model-robust estimation of multiple-group structural equation models. Algorithms 2023, 16, 210. [Google Scholar] [CrossRef]
  23. Cavanaugh, J.E.; Neath, A.A. The Akaike information criterion: Background, derivation, properties, application, interpretation, and refinements. WIREs Comput. Stat. 2019, 11, e1460. [Google Scholar] [CrossRef]
  24. Neath, A.A.; Cavanaugh, J.E. The Bayesian information criterion: Background, derivation, and applications. WIREs Comput. Stat. 2012, 4, 199–203. [Google Scholar] [CrossRef]
  25. Burnham, K.P.; Anderson, D.R. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach; Springer: New York, NY, USA, 2002. [Google Scholar] [CrossRef]
  26. O’Neill, M.; Burke, K. Variable selection using a smooth information criterion for distributional regression models. Stat. Comput. 2023, 33, 71. [Google Scholar] [CrossRef] [PubMed]
  27. Bollen, K.A.; Noble, M.D. Structural equation models and the quantification of behavior. Proc. Natl. Acad. Sci. USA 2011, 108, 15639–15646. [Google Scholar] [CrossRef] [PubMed]
  28. Oelker, M.R.; Pößnecker, W.; Tutz, G. Selection and fusion of categorical predictors with L0-type penalties. Stat. Model. 2015, 15, 389–410. [Google Scholar] [CrossRef]
  29. Shen, X.; Pan, W.; Zhu, Y. Likelihood-based selection and sharp parameter estimation. J. Am. Stat. Assoc. 2012, 107, 223–232. [Google Scholar] [CrossRef] [PubMed]
  30. Holland, P.W.; Wainer, H. (Eds.) Differential Item Functioning: Theory and Practice; Lawrence Erlbaum: Hillsdale, NJ, USA, 1993. [Google Scholar] [CrossRef]
  31. Mellenbergh, G.J. Item bias and item response theory. Int. J. Educ. Res. 1989, 13, 127–143. [Google Scholar] [CrossRef]
  32. Millsap, R.E. Statistical Approaches to Measurement Invariance; Routledge: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
  33. Bechger, T.M.; Maris, G. A statistical test for differential item pair functioning. Psychometrika 2015, 80, 317–340. [Google Scholar] [CrossRef] [PubMed]
  34. Doebler, A. Looking at DIF from a new perspective: A structure-based approach acknowledging inherent indefinability. Appl. Psychol. Meas. 2019, 43, 303–321. [Google Scholar] [CrossRef] [PubMed]
  35. San Martin, E. Identification of item response theory models. In Handbook of Item Response Theory, Volume 2: Statistical Tools; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 127–150. [Google Scholar] [CrossRef]
  36. Belzak, W.; Bauer, D.J. Improving the assessment of measurement invariance: Using regularization to select anchor items and identify differential item functioning. Psychol. Methods 2020, 25, 673–690. [Google Scholar] [CrossRef] [PubMed]
  37. Belzak, W.C.M.; Bauer, D.J. Using regularization to identify measurement bias across multiple background characteristics: A penalized expectation-maximization algorithm. J. Educ. Behav. Stat. 2024. Epub ahead of print. [Google Scholar] [CrossRef]
  38. Chen, Y.; Li, C.; Ouyang, J.; Xu, G. DIF statistical inference without knowing anchoring items. Psychometrika 2023, 88, 1097–1122. [Google Scholar] [CrossRef] [PubMed]
  39. Robitzsch, A. Comparing robust linking and regularized estimation for linking two groups in the 1PL and 2PL models in the presence of sparse uniform differential item functioning. Stats 2023, 6, 192–208. [Google Scholar] [CrossRef]
  40. Schauberger, G.; Mair, P. A regularization approach for the detection of differential item functioning in generalized partial credit models. Behav. Res. Methods 2020, 52, 279–294. [Google Scholar] [CrossRef] [PubMed]
  41. Tutz, G.; Schauberger, G. A penalty approach to differential item functioning in Rasch models. Psychometrika 2015, 80, 21–43. [Google Scholar] [CrossRef]
  42. Wang, C.; Zhu, R.; Xu, G. Using lasso and adaptive lasso to identify DIF in multidimensional 2PL models. Multivar. Behav. Res. 2023, 58, 387–407. [Google Scholar] [CrossRef] [PubMed]
  43. Pohl, S.; Schulze, D.; Stets, E. Partial measurement invariance: Extending and evaluating the cluster approach for identifying anchor items. Appl. Psychol. Meas. 2021, 45, 477–493. [Google Scholar] [CrossRef] [PubMed]
  44. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation: Vienna, Austria, 2023; Available online: https://www.R-project.org/ (accessed on 15 March 2023).
  45. Robitzsch, A. sirt: Supplementary Item Response Theory Models. 2024. R Package Version 4.1-15. Available online: https://CRAN.R-project.org/package=sirt (accessed on 6 February 2024.).
  46. Reckase, M.D. Multidimensional Item Response Theory Models; Springer: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
  47. Chen, J. A partially confirmatory approach to the multidimensional item response theory with the Bayesian lasso. Psychometrika 2020, 85, 738–774. [Google Scholar] [CrossRef] [PubMed]
  48. Chen, Y.; Li, X.; Liu, J.; Ying, Z. Robust measurement via a fused latent and graphical item response theory model. Psychometrika 2018, 83, 538–562. [Google Scholar] [CrossRef] [PubMed]
  49. Sun, J.; Chen, Y.; Liu, J.; Ying, Z.; Xin, T. Latent variable selection for multidimensional item response theory models via L1 regularization. Psychometrika 2016, 81, 921–939. [Google Scholar] [CrossRef]
  50. Goretzko, D. Regularized exploratory factor analysis as an alternative to factor rotation. Eur. J. Psychol. Assess. 2023. Epub ahead of print. [Google Scholar] [CrossRef]
  51. Scharf, F.; Nestler, S. Should regularization replace simple structure rotation in exploratory factor analysis? Struct. Equ. Modeling 2019, 26, 576–590. [Google Scholar] [CrossRef]
  52. OECD. PISA 2015. Technical Report; OECD: Paris, France, 2017; Available online: https://bit.ly/32buWnZ (accessed on 2 April 2024).
  53. Wijayanto, F.; Mul, K.; Groot, P.; van Engelen, B.G.M.; Heskes, T. Semi-automated Rasch analysis using in-plus-out-of-questionnaire log likelihood. Brit. J. Math. Stat. Psychol. 2021, 74, 313–339. [Google Scholar] [CrossRef] [PubMed]
  54. Rasch, G. Probabilistic Models for Some Intelligence and Attainment Tests; Danish Institute for Educational Research: Copenhagen, Denmark, 1960. [Google Scholar]
  55. Beisemann, M.; Holling, H.; Doebler, P. Every trait counts: Marginal maximum likelihood estimation for novel multidimensional count data item response models with rotation or L1-regularization for simple structure. PsyArXiv 2024. [Google Scholar] [CrossRef]
  56. Chen, Y.; Liu, J.; Xu, G.; Ying, Z. Statistical analysis of Q-matrix based diagnostic classification models. J. Am. Stat. Assoc. 2015, 110, 850–866. [Google Scholar] [CrossRef] [PubMed]
  57. McNeish, D.M. Using lasso for predictor selection and to assuage overfitting: A method long overlooked in behavioral sciences. Multivar. Behav. Res. 2015, 50, 471–484. [Google Scholar] [CrossRef] [PubMed]
  58. Bai, J. Panel data models with interactive fixed effects. Econometrica 2009, 77, 1229–1279. [Google Scholar] [CrossRef]
  59. Imai, K.; Ratkovic, M. Estimating treatment effect heterogeneity in randomized program evaluation. Ann. Appl. Stat. 2013, 7, 443–470. [Google Scholar] [CrossRef]
  60. White, H. Maximum likelihood estimation of misspecified models. Econometrica 1982, 50, 1–25. [Google Scholar] [CrossRef]
  61. Boos, D.D.; Stefanski, L.A. Essential Statistical Inference; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
  62. Huang, P.H. Penalized least squares for structural equation modeling with ordinal responses. Multivar. Behav. Res. 2022, 57, 279–297. [Google Scholar] [CrossRef] [PubMed]
  63. Asparouhov, T.; Muthén, B. Penalized structural equation models. Struct. Equ. Modeling 2023. Epub ahead of print. [Google Scholar] [CrossRef]
Figure 1. Simulation Study 2: Data-generating model with I = 20 items X i ( i = 1 , , 20 ) and two factor variables θ 1 and θ 2 . Cross-loadings are depicted by red dashed lines.
Figure 1. Simulation Study 2: Data-generating model with I = 20 items X i ( i = 1 , , 20 ) and two factor variables θ 1 and θ 2 . Cross-loadings are depicted by red dashed lines.
Algorithms 17 00153 g001
Table 1. Simulation Study 1: (Average) absolute bias and average root mean square error (RMSE) of model parameters as a function of the extent of differential item functioning (DIF) and sample size N for balanced and unbalanced DIF.
Table 1. Simulation Study 1: (Average) absolute bias and average root mean square error (RMSE) of model parameters as a function of the extent of differential item functioning (DIF) and sample size N for balanced and unbalanced DIF.
(Average) Absolute Bias(Average) RMSE
Par DIF N AIC SAIC BIC SBIC AIC SAIC BIC SBIC
μ 2 Balanced DIF
5000.0010.0040.0000.0030.1040.0900.1130.104
small10000.0020.0000.0010.0010.0690.0640.0750.070
20000.0010.0010.0000.0000.0480.0460.0460.047
5000.0050.0000.0070.0040.1010.0930.0960.098
large10000.0030.0040.0010.0010.0710.0660.0670.068
20000.0020.0010.0020.0020.0500.0490.0490.049
σ 2 5000.0020.0020.0070.0010.0710.0700.0700.070
small10000.0010.0010.0020.0010.0460.0460.0460.046
20000.0010.0010.0010.0020.0320.0320.0320.032
5000.0030.0010.0000.0020.0680.0670.0670.067
large10000.0020.0030.0020.0020.0450.0440.0440.044
20000.0010.0010.0010.0010.0350.0350.0350.035
δ i
(no DIF)
5000.0060.0060.0030.0040.2160.1870.1080.148
small10000.0050.0030.0020.0020.1390.1130.0610.069
20000.0020.0020.0010.0010.0980.0730.0320.028
5000.0030.0050.0030.0020.2010.1880.0820.142
large10000.0060.0040.0010.0020.1400.1150.0470.067
20000.0060.0020.0010.0010.0980.0730.0310.030
δ i
(DIF)
5000.0250.0240.1820.0770.3490.3300.4960.398
small10000.0060.0080.0620.0410.2110.2130.3150.276
20000.0030.0030.0100.0110.1370.1350.1550.157
5000.0260.0240.0220.0220.3110.3020.3400.311
large10000.0170.0170.0150.0150.2120.2070.2100.208
20000.0070.0040.0040.0030.1490.1460.1460.146
Unbalanced DIF
μ 2 5000.0930.0990.1150.0910.1570.1390.1630.144
small10000.0500.0470.0490.0330.1110.0870.1100.085
20000.0250.0080.0130.0040.0690.0480.0720.048
5000.0530.0720.0240.0200.1450.1220.1430.100
large10000.0240.0300.0030.0020.0840.0760.0770.068
20000.0040.0120.0010.0000.0530.0590.0480.048
σ 2 5000.0040.0040.0010.0030.0670.0670.0670.067
small10000.0000.0000.0010.0000.0480.0480.0480.048
20000.0000.0000.0010.0000.0320.0310.0310.031
5000.0000.0010.0000.0010.0650.0650.0640.065
large10000.0000.0010.0010.0000.0460.0460.0450.045
20000.0010.0020.0010.0010.0310.0320.0310.032
δ i
(no DIF)
5000.1140.1080.0350.0620.2960.2510.1700.207
small10000.0720.0550.0280.0160.2070.1500.1430.095
20000.0370.0130.0150.0010.1220.0790.0980.023
5000.0790.1140.0320.0300.2550.2360.2060.144
large10000.0380.0510.0050.0030.1330.1450.0720.065
20000.0100.0250.0010.0010.0550.1070.0230.030
δ i
(DIF)
5000.1850.2210.3990.2620.4050.4090.5530.462
small10000.0890.1110.1580.1170.2700.2850.3680.319
20000.0360.0100.0200.0100.1670.1350.1790.151
5000.0750.1120.0370.0290.3390.3120.3540.303
large10000.0360.0500.0050.0040.2120.1990.2070.196
20000.0110.0200.0040.0040.1380.1440.1370.138
Note. Par = parameter; μ 2 = mean of θ in second group; σ 2 = standard deviation of θ in second group; δ i (no DIF) = DIF parameters with zero population values; δ i (DIF) = DIF parameters with non-zero population values; Absolute bias values larger than 0.03 are printed in bold font.
Table 2. Simulation Study 1: Type-I error rate and power rate for DIF effects δ i as a function of the extent of differential item functioning (DIF) and sample size N for balanced and unbalanced DIF.
Table 2. Simulation Study 1: Type-I error rate and power rate for DIF effects δ i as a function of the extent of differential item functioning (DIF) and sample size N for balanced and unbalanced DIF.
Type-I Error RatePower Rate
DIF N AIC SAIC BIC SBIC AIC SAIC BIC SBIC
Balanced DIF
50017.013.72.16.283.685.852.773.3
small100014.48.21.42.197.096.181.587.3
200014.46.20.70.599.999.897.697.5
50015.314.21.25.899.799.897.599.4
large100015.09.00.82.010010099.9100
200014.56.20.60.6100100100100
Unbalanced DIF
50025.823.84.811.368.965.630.654.0
small100021.015.85.13.789.485.471.379.9
200013.69.12.70.497.799.695.998.1
50014.528.93.46.398.099.096.798.9
large10009.622.40.62.099.810099.7100
20003.421.50.30.6100100100100
Note. Type-I error rates larger than 10.0 and power rates smaller than 80.0 are printed in bold font.
Table 3. Simulation Study 2: (Average) absolute bias and average root mean square error (RMSE) of model parameters as a function of the size of cross-loadings and sample size N.
Table 3. Simulation Study 2: (Average) absolute bias and average root mean square error (RMSE) of model parameters as a function of the size of cross-loadings and sample size N.
Absolute BiasRMSE
Par CL N AIC SAIC BIC SBIC AIC SAIC BIC SBIC
ρ 5000.0640.0540.1040.0700.1510.1020.1340.110
small10000.0400.0700.0880.0850.1570.0960.1240.104
20000.0150.0180.0590.0650.0770.0520.0820.081
5000.0550.0620.1360.0730.1670.1090.1790.117
large10000.0290.0540.0740.0570.1440.1000.1240.099
20000.0100.0140.0160.0140.0620.0470.0580.051
C L = 0 5000.0410.0160.0150.0080.2430.1130.1300.095
small10000.0250.0280.0080.0140.1880.1190.0920.088
20000.0110.0060.0060.0030.1020.0620.0490.034
5000.0440.0160.0240.0110.2820.1190.1810.109
large10000.0270.0280.0130.0150.1830.1270.0960.095
20000.0090.0090.0050.0020.0940.0670.0490.030
C L 0 5000.1020.1410.2380.1770.2870.2870.3060.296
small10000.0750.1180.2110.1920.2370.2350.2880.273
20000.0210.0330.1320.1540.1400.1460.2310.243
5000.0780.1300.2950.1660.3410.3530.4520.379
large10000.0360.0660.1510.1070.2280.2450.3360.291
20000.0110.0090.0250.0280.1230.1160.1610.161
Note. Par = parameter; ρ  = correlation between factors θ 1 and θ 2 ; C L = 0  = cross-loading with zero population value; C L 0  = cross-loading with non-zero population value; Absolute bias values larger than 0.03 are printed in bold font.
Table 4. Simulation Study 2: Type-I error rate and power rate for cross-loadings as a function of the size of cross-loadings and sample size N.
Table 4. Simulation Study 2: Type-I error rate and power rate for cross-loadings as a function of the size of cross-loadings and sample size N.
Type-I Error RatePower Rate
CL N AIC SAIC BIC SBIC AIC SAIC BIC SBIC
50016.75.12.12.541.230.69.422.4
small100018.78.82.03.258.046.517.924.3
200015.16.21.70.786.582.444.038.3
50018.54.83.12.868.559.527.352.0
large100017.010.02.53.287.481.059.270.6
200013.37.81.30.798.698.792.392.6
Note. CL = size of cross-loadings; Type-I error rates larger than 10.0 and power rates smaller than 80.0 are printed in bold font.
Table 5. Simulation Study 3: (Average) absolute bias and average root mean square error (RMSE) of model parameters as a function of the sample size N and the size and the extent and direction of deviations from the Rasch model.
Table 5. Simulation Study 3: (Average) absolute bias and average root mean square error (RMSE) of model parameters as a function of the sample size N and the size and the extent and direction of deviations from the Rasch model.
(Average) Absolute Bias(Average) RMSE
Par Dev N AIC SAIC BIC SBIC AIC SAIC BIC SBIC
Balanced deviations from the Rasch model
σ 5000.0530.0380.0040.0100.0990.0820.0700.072
small10000.0620.0050.0120.0040.1070.0540.0510.056
20000.1090.0010.0250.0090.1670.0320.0440.037
5000.0180.0140.0040.0060.0690.0650.0640.064
large10000.0160.0020.0020.0020.0500.0440.0410.041
20000.0070.0000.0000.0000.0350.0330.0300.030
α i = 0 5000.0350.0170.0010.0020.1240.0910.0480.047
small10000.0480.0030.0020.0010.1190.0600.0270.020
20000.0960.0010.0030.0000.1660.0420.0230.003
5000.0180.0120.0020.0020.1010.0820.0460.044
large10000.0170.0030.0010.0000.0660.0580.0230.014
20000.0070.0010.0000.0000.0330.0400.0100.003
α i 0 5000.0710.0700.1270.1370.2240.2390.2830.289
small10000.0810.0240.0750.1010.1840.1540.2120.237
20000.1200.0030.0720.0450.1990.0850.1590.156
5000.0140.0170.0150.0180.2070.2150.2300.231
large10000.0190.0060.0060.0080.1390.1360.1380.143
20000.0040.0040.0030.0040.0950.0950.0930.094
Unbalanced deviations from the Rasch model
σ 5000.0080.0320.0840.0830.0760.0810.1100.109
small10000.0150.0080.0280.0550.0510.0500.0610.078
20000.0110.0030.0020.0150.0360.0320.0310.040
5000.0010.0010.0010.0010.0690.0620.0610.060
large10000.0050.0010.0000.0010.0460.0430.0400.040
20000.0020.0030.0020.0020.0340.0320.0300.029
α i = 0 5000.0090.0050.0080.0040.0990.0780.0570.047
small10000.0140.0030.0020.0010.0680.0560.0270.019
20000.0100.0020.0010.0000.0440.0420.0150.005
5000.0030.0030.0010.0010.1050.0790.0460.041
large10000.0040.0010.0010.0000.0690.0530.0250.013
20000.0030.0010.0010.0000.0470.0390.0140.002
α i 0 5000.0290.0690.1940.2050.1800.2070.2810.289
small10000.0120.0140.0690.1450.1030.1160.1820.240
20000.0120.0010.0020.0370.0680.0670.0750.129
5000.0080.0060.0070.0050.1290.1260.1270.127
large10000.0110.0070.0060.0060.0910.0890.0890.089
20000.0030.0030.0030.0030.0620.0610.0610.061
Note. Par = parameter; Dev = size of deviation from the Rasch model; σ = standard deviation of factor variable θ; αi = 0 = logarithm of item discriminations with zero population value; αi ≠0 = logarithm of item discriminations with non-zero population value; Absolute bias values larger than 0.03 are printed in bold font.
Table 6. Simulation Study 3: Type-I error rate and power rate for logarithm of item discriminations as a function of the sample size N and the size and the extent and direction of deviations from the Rasch model.
Table 6. Simulation Study 3: Type-I error rate and power rate for logarithm of item discriminations as a function of the sample size N and the size and the extent and direction of deviations from the Rasch model.
Type-I Error RatePower Rate
Dev N AIC SAIC BIC SBIC AIC SAIC BIC SBIC
Balanced deviations from the Rasch model
50014.67.71.31.268.861.640.639.6
small100018.56.50.70.377.685.563.655.9
200036.47.70.60.078.898.775.579.8
50011.46.61.31.297.995.893.893.6
large10009.36.60.60.299.799.899.498.9
20004.47.40.10.0100100100100
Unbalanced deviations from the Rasch model
50011.45.71.81.180.469.530.929.9
small100010.85.90.80.397.994.072.550.9
20008.37.70.40.199.999.998.287.3
50013.75.91.21.010010099.999.9
large100012.15.80.70.2100100100100
200010.97.30.40.0100100100100
Note. Dev = size of deviation from the Rasch model; Type-I error rates larger than 10.0 and power rates smaller than 80.0 are printed in bold font.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Robitzsch, A. Smooth Information Criterion for Regularized Estimation of Item Response Models. Algorithms 2024, 17, 153. https://doi.org/10.3390/a17040153

AMA Style

Robitzsch A. Smooth Information Criterion for Regularized Estimation of Item Response Models. Algorithms. 2024; 17(4):153. https://doi.org/10.3390/a17040153

Chicago/Turabian Style

Robitzsch, Alexander. 2024. "Smooth Information Criterion for Regularized Estimation of Item Response Models" Algorithms 17, no. 4: 153. https://doi.org/10.3390/a17040153

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop