Next Article in Journal
The Gamma-Topp-Leone-Type II-Exponentiated Half Logistic-G Family of Distributions with Applications
Next Article in Special Issue
Some More Results on Characterization of the Exponential and Related Distributions
Previous Article in Journal
Area under the Curve as an Alternative to Latent Growth Curve Modeling When Assessing the Effects of Predictor Variables on Repeated Measures of a Continuous Dependent Variable
Previous Article in Special Issue
Combining Probability and Nonprobability Samples by Using Multivariate Mass Imputation Approaches with Application to Biomedical Research
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Modeling Model Misspecification in Structural Equation Models

by
Alexander Robitzsch
1,2
1
Centre for International Student Assessment (ZIB), IPN—Leibniz Institute for Science and Mathematics Education, 24118 Kiel, Germany
2
Centre for International Student Assessment (ZIB), 24118 Kiel, Germany
Stats 2023, 6(2), 689-705; https://doi.org/10.3390/stats6020044
Submission received: 10 May 2023 / Revised: 12 June 2023 / Accepted: 14 June 2023 / Published: 14 June 2023
(This article belongs to the Special Issue Advances in Probability Theory and Statistics)

Abstract

:
Structural equation models constrain mean vectors and covariance matrices and are frequently applied in the social sciences. Frequently, the structural equation model is misspecified to some extent. In many cases, researchers nevertheless intend to work with a misspecified target model of interest. In this article, a simultaneous statistical inference for sampling errors and model misspecification errors is discussed. A modified formula for the variance matrix of the parameter estimate is obtained by imposing a stochastic model for model errors and applying M-estimation theory. The presence of model errors is quantified in increased standard errors in parameter estimates. The proposed inference is illustrated with several analytical examples and an empirical application.

1. Introduction

Confirmatory factor analysis (CFA) and structural equation models (SEM) are statistical approaches to analyzing multivariate data in the social sciences [1,2,3,4,5,6,7]. These models relate a multivariate vector X = ( X 1 , , X I ) of observed (i.e., manifest) I variables (also referred to as indicators or items) to a vector of latent variables (i.e., factors) η . SEMs impose constraints on the mean vector μ and the covariance matrix Σ of the random variable X as a function of an unknown parameter vector θ . In particular, the mean vector is represented as μ ( θ ) , and the covariance matrix is represented by Σ ( θ ) .
SEM, and CFA as a particular case, define a measurement model that relates the observed variables X to latent variables η
X = ν + Λ η + ϵ .
In addition, we denote the covariance matrix Var ( ϵ ) = Ψ , and η and ϵ are multivariate normally distributed random vectors. The random vectors η and ϵ are assumed to be uncorrelated. In CFA, these vectors follow a multivariate normal (MVN) distribution as η MVN ( α , Φ ) and ϵ MVN ( 0 , Ψ ) . Hence, we can write the mean and the covariance matrix in CFA as
μ ( θ ) = ν + Λ α and Σ ( θ ) = Λ Φ Λ + Ψ .
In SEM, relationships among the latent variables can be specified as regression models or path models
η = B η + ξ with E ( ξ ) = α and Var ( ξ ) = Φ ,
where B denotes a matrix of regression coefficients. Hence, the mean vector and the covariance matrix are represented in SEM as
μ ( θ ) = ν + Λ ( I B ) 1 α   and   Σ ( θ ) = Λ ( I B ) 1 Φ [ ( I B ) 1 ] Λ + Ψ ,
where I denotes the identity matrix.
In practice, SEM parsimoniously parametrizes the mean vector and the covariance matrix using a parameter vector θ as a statistical summary. Such restrictions are unlikely to hold in practice, and model assumptions in SEM are only an approximation of a true data-generating model. In SEM, model deviations (e.g., model errors, model misspecification) in covariances emerge as a difference between a population covariance matrix Σ and a model-implied covariance matrix Σ ( θ ) (see Refs. [8,9]). Furthermore, there can be differences in the population mean vector μ and the model-implied mean vector μ ( θ ) .
This article addresses how to include model misspecification in statistical inference for parameter estimates. Wu and Browne [9,10] proposed an estimation approach that simultaneously models sampling errors and model errors. They do so by modifying the estimation function in the SEM and estimating the model with the maximum likelihood approach. Uanhoro [11] builds on the approach of Wu and Browne [9] but employs Bayesian (i.e., Markov chain Monte Carlo) estimation. Both approaches have in common that the presence of model errors is quantified in increased standard errors in parameter estimates. In this article, the estimation function in the SEM remains unchanged. We derive a simultaneous statistical inference regarding sampling and model errors based on M-estimation theory [12]. As a consequence, this article only addresses an alternative method of estimating standard errors in SEMs. The estimates of model parameters in the SEM are left unchanged.
The remainder of the article is organized as follows. Different estimation methods and standard error estimates with respect to sampling error are reviewed in Section 2. Section 3 presents a stochastic model for model errors and derives the extended variance formula that simultaneously addresses sampling errors and model errors. In Section 4, three analytical illustrative examples show how model errors are reflected in the variance of parameter estimates. Section 5 presents a numerical example using a survey dataset in which the proposed approach is applied. Finally, the article closes with a discussion in Section 6.

2. Estimating Structural Equation Models

In this section, we review different estimation methods for multiple-group SEMs. Note that some identification constraints at the population level must be imposed to estimate the SEM [2,13,14]. When modeling multivariate normally distributed data without missing data, the empirical mean vector x ¯ and the empirical covariance matrix S are sufficient statistics for an unknown mean vector μ and covariance matrix Σ . Hence, the statistics x ¯ and S are also sufficient for the parameter vector θ of the SEM.
Assume that there are G groups with sample sizes N g and empirical means x ¯ g and covariance matrices S g for groups g = 1 , , G . The population mean vectors are denoted by μ g , and the population covariance matrices are denoted by Σ g ( g = 1 , , G ). The model-implied mean vectors are denoted by μ g ( θ ) and the model-implied covariance matrix by Σ g ( θ ) . The parameter vector θ can have common parameters across groups and parameters that are group-specific. For example, in a CFA, equal factor loadings and item intercepts across groups are imposed (i.e., measurement invariance holds; [15,16]) by assuming the same loading matrix Λ and the same intercept vector ν across groups, while mean vectors and covariance matrices are allowed to differ across groups.
The maximum likelihood (ML) function for the parameter θ in the SEM is given by the following (see Refs. [2,4]):
F ML ( θ ; { x ¯ g } , { S g } ) = g = 1 G N g 2 I log ( 2 π ) + log | Σ g ( θ ) | + tr ( S g Σ g ( θ ) 1 ) + ( x ¯ g μ g ( θ ) ) Σ g ( θ ) 1 ( x ¯ g μ g ( θ ) ) ,
  where { x ¯ g } and { S g } denote the sets of the empirical mean vectors and empirical covariance matrices for groups g = 1 , , G , respectively. In practice, the model-implied covariance matrix can be misspecified [17,18,19,20], and θ is a pseudo-true parameter defined as the maximizer of the fitting function F ML in (5). Importantly, θ does not refer to a parameter of the data-generating model in this case, but it should be interpreted as a summary of the data that is of central interest to the researcher.
A more general class of fitting functions in SEMs is weighted least squares (WLS) estimation [3,4,21]. The parameter vector θ is determined as the minimizer of
F WLS ( θ ; { x ¯ g } , { S g } ) = g = 1 G ( x ¯ g μ g ( θ ) ) W 1 g ( x ¯ g μ g ( θ ) ) + g = 1 G ( s g σ g ( θ ) ) W 2 g ( s g σ g ( θ ) ) ,
  where matrices Σ and S have been replaced by vectors σ and s that collect all nonduplicated elements of the matrices in vectors. Formally, we denote the vech operator for this transformation that defines σ g = vech ( Σ g ) and s g = vech ( S g ) . The weight matrices W 1 g and W 2 g ( g = 1 , , G ) can also depend on parameters that must be estimated prior to solving the estimation problem (6). Diagonally weighted least squares (DWLS) estimation results by choosing diagonal weight matrices W g 1 and W g 2 . In this case, the fitting function can be written as
F DWLS ( θ ; { x ¯ g } , { S g } ) = g = 1 G i = 1 I w 1 g i ( x ¯ g i μ g i ( θ ) ) 2 + g = 1 G i = 1 I j = i I w 2 g i j ( s g i j σ g i j ( θ ) ) 2 ,
  where w 1 g i and w 2 g i j are appropriate elements in W 1 g and W 2 g , respectively. Unweighted least squares (ULS) estimation is obtained by setting all weights w 1 g i and w 2 g i j equal to one.
Interestingly, the minimization of F DWLS in (6) with respect to the parameter θ can be viewed as a nonlinear least squares estimation problem with sufficient statistics { x ¯ g } and { S g } as input data [22]. It has been shown that ML estimation can be approximately written as DWLS estimation [23] with particular weight matrices. The weights are approximately determined by w 1 g i = 1 / u g i 2 and w 2 g i j = 1 / ( u g i 2 u g j 2 ) , where u g i 2 are sample unique standardized variances with u g i 2 = ψ g i i / σ g i i (see Ref. [23]).
The fitting functions can be slightly more generally formulated as a sum of group-specific fitting functions
F ( θ , ξ ^ ) = g = 1 G F g ( θ , ξ ^ g ) ,
where ξ ^ g = ( x ¯ g , s g ) denote the vectors of group-specific sufficient statistics ( g = 1 , , G ). The parameter estimate θ ^ is obtained as the root of the partial derivative of F with respect to θ defined in (8):
F θ ( θ , ξ ^ ) = g = 1 G F g , θ ( θ , ξ ^ g ) = 0 ,
where F θ and F g , θ denote the partial derivatives of F and F g with respect to θ , respectively. The parameter estimate θ ^ is a nonlinear function of the input vector of sufficient statistics ξ ^ . Hence, the distribution of θ ^ can be expressed as a function of the distribution of ξ ^ by applying the multivariate delta method [17,24] (see also [25]).
The asymptotic distribution of the vector of sufficient statistics ξ ^ g is given as
ξ ^ g ξ = x ¯ g s g μ g σ g MVN ( 0 , V g ) .
The covariance matrix V g is determined by
V g = N g 1 Σ g 0 0 K ( Σ g Σ g ) K ,
where ⊗ denotes the Kronecker product and K is a matrix containing entries 0, 0.5, and 1 such that σ g = vech ( Σ g ) = K vec ( Σ g ) , where the vec operator stacks all elements of a matrix into a vector. The covariance matrix V g in (11) can be estimated by substituting the population covariance matrix Σ g with the empirical covariance matrix S g :
V ^ g = N g 1 S g 0 0 K ( S g S g ) K .
The covariance matrix V = Var ( ξ ^ ) is given as a block-diagonal matrix of covariance matrices V g for g = 1 , , G :
V = V 1 0 0 0 V 2 0 0 0 V G .
A corresponding estimate V ^ is obtained by using estimates V ^ g ( g = 1 , , G ) for group-specific covariance matrices.
Assume that the population sufficient statistics are denoted by ξ 0 and there exists a pseudo-true parameter θ 0 such that F θ ( θ , ξ 0 ) = 0 . Hence, we can write ξ 0 = ξ ( θ 0 ) . Note again that the parameter ξ 0 does not refer to a data-generating parameter, but it is defined as a summary of the data by choosing a particular function F. Different pseudo-true parameters θ 0 will be obtained for different choices of fitting functions F in misspecified SEMs; specifically, θ 0 is a function of ξ 0 and F (i.e., θ 0 = g ( ξ 0 , F ) for some function g).
We now derive the covariance matrix of θ ^ by utilizing a Taylor expansion of F θ around ( θ 0 , ξ 0 ) . Denote by F θ θ and F θ ξ the matrices of second-order partial derivatives of F θ with respect to θ and ξ , respectively. Then, we obtain
F θ ( θ ^ , ξ ^ ) = F θ ( θ 0 , ξ 0 ) + F θ θ ( θ 0 , ξ 0 ) ( θ ^ θ 0 ) + F θ ξ ( θ 0 , ξ 0 ) ( ξ ^ ξ 0 ) .
As the parameter estimate θ ^ is a nonlinear function of ξ ^ , the Taylor expansion (14) provides the approximation
θ ^ θ 0 = F θ θ ( θ 0 , ξ 0 ) 1 F θ ξ ( θ 0 , ξ 0 ) ( ξ ^ ξ 0 ) .
By defining A = F θ θ ( θ 0 , ξ 0 ) and B = F θ ξ ( θ 0 , ξ 0 ) , we obtain the multivariate delta formula [17,26]
Var ( θ ^ ) = A 1 B V B ( A 1 ) .
The matrices A and B can be estimated by A ^ = F θ θ ( θ ^ , ξ ^ ) and B ^ = F θ ξ ( θ ^ , ξ ^ ) . The estimated covariance matrix Var ( θ ^ ) in (16) can be used for statistical inference, such as the computation of standard errors or the application of Wald tests.
The standard error (SE) of the lth entry θ ^ l in θ ^ is given by
SE ( θ ^ l ) = Var ( θ ^ ) l l = A 1 B V B A 1 l l .
In this section, we computed the covariance matrix of the parameter estimate ξ ^ with respect to sampling errors. In particular, we assumed a sampling scheme of identically and independently distributed observations that led to variability in sufficient statistics ξ ^ , which, in turn, resulted in variability in estimated model parameters θ ^ across repeated sampling. In the next section, we additionally address the extent of model misspecification errors in the covariance matrix. The presence of model misspecification should be quantified in increased standard errors. We do so by imposing a stochastic model on model specification errors.

3. Modeling Model Misspecification

In this section, we impose a stochastic model on model misspecification in the SEM. At the population level, the population mean vector μ can differ from the model-implied mean vector μ ( θ 0 ) , and the (vectorized) population covariance matrix σ can differ from the model-implied covariance matrix σ ( θ 0 ) . As in Section 2, we define the vector ξ = ( μ , σ ) , where ξ contains all group-specific means and covariances.

3.1. Stochastic Model for Model Misspecification

Assume that there exists a θ 0 such that
ξ = ξ ( θ 0 ) + e ,
where e constitutes the model specification error. The vector e contains deviations (i.e., model misspecification) in all means and pairwise covariances in all groups. Formally, we define e μ = ( e μ , g , i ) g = 1 , , G ; i = 1 , , I , e σ = ( e σ , g , i ) g = 1 , , G ; i , j = 1 , , I for i < j and e = ( e μ , g , e σ , g ) . Assume that E ( e ) = 0 .
To model misspecification in the mean structure, assume that e μ , g , i are normally distributed variables with zero mean and variance τ μ (i.e., E ( e μ , g , i ) = 0 and Var ( e μ , g , i ) = τ μ ). All variables e μ , g , i contained in the vector e μ are independently and identically distributed.
To model misspecification in the covariance structure, we assume an effect decomposition of the error in the modeled covariance of items i j in group g as
e σ , g , i j = u g , i + u g , j + v g , i j ,
where u g , i and v g , i j are uncorrelated random effects for all i and all pairs i j , respectively. The model (19) is a cross-classified two-level model [27]. The stochastic model in (19) fundamentally differs from the approach in [11] that assumes independent e σ , g , i j effects (i.e., there are no item effects u g , i for model errors in covariances). Because an item appears in several item pairs referring to different covariances, we find the inclusion of item effects u g , i in (19) more plausible. The appearance of u g , i and u g , j in (19) might be motivated by the fact that (intentionally) misspecified factor loadings of item i enter all residuals e σ , g , i j with j i (see Appendix A). For multidimensional factor models, the stochastic model (19) might be made more general (see (A7) in Appendix A). Note that we set e σ , g , i i = 0 as in [11]. Thus, diagonal entries in the covariance matrix are assumed to be correctly specified at the population level. We can compute for i j and k h
E ( e σ , g , i j e σ , g , k h ) = 0 if card { i , j } { k , h } = 0 τ σ , 2 if card { i , j } { k , h } = 1 2 τ σ , 2 + τ σ , 1 if card { i , j } { k , h } = 2 ,
where card ( A ) denotes the cardinality of a set A. The condition if card { i , j } { k , h } = 2 means that i = k and j = h ; that is, E ( e σ , g , i j 2 ) = 2 τ σ , 2 + τ σ , 1 .

3.2. Estimating the Variance Components in the Stochastic Model

The model errors e μ , g , i and e σ , g , i j are not directly observable. For statistical inference regarding the stochastic model for model misspecification, the variance components in (20) must be estimated. Instead of computing e = ξ ξ ( θ 0 ) , we compute empirical residuals e ^ that are defined as e ^ = ξ ^ ξ ( θ ^ ) . Note that these residuals are included in the standard output of widespread SEM software [28,29,30].
One can compute quantities e ^ μ , g , i 2 and e ^ σ , g , i j e ^ σ , g , k h for ( i , j , k , h ) as estimates of e μ , g , i and e σ , g , i j e σ , g , h k and equate them with expected values. This approach is referred to as the method of moments [31]. Define the vector τ = ( τ μ , τ σ , 2 , τ σ , 1 ) . The vector of empirical variances and covariances that contain the product quantities is denoted by z ^ . According to (20), the expected values of cross-products are linear in τ . Hence, the method of moments maps the empirical (co)variances defined in z ^ to the vector of unknown variance components τ using the linear model
z ^ = H τ + ε ,
where H is an appropriate known design matrix that contains entries 0, 1, or 2. The linear model (21) can be solved by
τ ^ = H ˜ z ^ with H ˜ = ( H H ) 1 H .
Negatively estimated variances can be set to zero.
In the case of our defined variance component model for the mean and the covariance structure, simple formulas for the variance estimates can be derived. The variance τ μ can be estimated by
τ ^ μ = 1 G I g = 1 G i = 1 I e ^ μ , g , i 2 .
Let M a ( a = 0 , 1 , 2 ) denote the set of cross-products of residuals e ^ σ , g , i j and e ^ σ , g , k h with card { i , j } { k , h } = a . We define Y a as the average of the products from the set M a . Then, we use the estimate τ ^ σ , 2 = max ( 0 , Y 1 ) . Finally, we compute τ ^ σ , 1 = max ( 0 , Y 2 2 τ ^ σ , 2 ) .
However, z ^ is affected by sampling error. We can write e ^ e MVN ( 0 , V 2 ) . The vector z ^ contains products of normally distributed variables. Hence, the bias B in z ^ due to sampling errors can be estimated by computing
E ( z ^ ) = B + z
Then, we obtain from (22) and (24)
τ ^ = H ˜ z ^ B
The bias B can also be determined by resampling techniques. The parameter estimates τ ^ μ , τ ^ σ , 2 , and τ ^ σ , 1 can be repeatedly computed from bootstrap samples of subjects. Then, a bootstrap bias of variance components can be determined [32]. As a result, bias-corrected variance component estimates can be computed. Again, negatively estimated variances are set to zero.

3.3. Error in Model Parameters Due to Model Misspecification

Now, the variance component τ referring to model misspecification has been determined. In the next step, we compute the variance in the SEM parameter estimate θ ^ due to model misspecification error. As in Section 2, we apply a Taylor expansion around ( θ 0 , ξ 0 ) with ξ = ξ ( θ 0 ) and obtain
F θ ( θ ^ , ξ ) = F θ ( θ 0 , ξ 0 ) + F θ θ ( θ 0 , ξ 0 ) ( θ ^ θ 0 ) + g = 1 G F g , θ ξ ( θ 0 , ξ g , 0 ) ( ξ g ξ g , 0 )
Using again the abbreviation A = F θ θ ( θ 0 , ξ 0 ) , we obtain, by solving for θ ^ in (26),
θ ^ θ 0 = A 1 g = 1 G F g , θ ξ ( θ 0 , ξ g , 0 ) ( ξ g ξ g , 0 )
We now simplify (27) regarding the distributional assumptions of model misspecification. Denote by M g , i and C g , i j the corresponding second-order derivatives with respect to appropriate entries in ξ g in the function F g , θ , ξ . Moreover, we set C g , i i = 0 and note that C g , i j = C g , j i . We then obtain, for the variance contribution of group g,
F g , θ ξ ( θ 0 , ξ g , 0 ) ( ξ g ξ g , 0 ) = i = 1 I M g , i e μ , g , i + i = 1 I 1 j = i + 1 I C g , i j e σ , g , i j = i = 1 I M g , i e μ , g , i + i = 1 I 1 j = i + 1 I C g , i j ( u g , i + u g , j + v g , i j ) = i = 1 I M g , i e μ , g , i + i = 1 I u g , i j = 1 I C g , i j + i = 1 I 1 j = i + 1 I C g , i j v g , i j
We can now derive the variance V g = Var F g , θ ξ ( θ 0 , ξ g , 0 ) ( ξ g ξ g , 0 ) :
V g = τ μ i = 1 I M g , i M g , i + τ σ , 1 i = 1 I j = 1 I C g , i j j = 1 I C g , i j + τ σ , 2 i = 1 I 1 j = i + 1 I C g , i j C g , i j .
By using the abbreviation V = g = 1 G V g , we obtain by using (27)
Var ( θ ^ ) = A 1 V A 1 .
The misspecification error (ME) for th lth entry θ ^ l in θ ^ is given by
ME ( θ ^ l ) = Var ( θ ^ ) l l = A 1 V A 1 l l .
As an alternative to the proposed analytical solution in this subsection, the uncertainty in parameter estimates due to model misspecification can be assessed by parametric bootstrapping. This procedure is based on the stochastic model (18). If variance components for the stochastic model for residuals e in (18) are estimated, a random draw of new residuals e * can be conducted for each bootstrap sample, which subsequently provides a draw from the vector of sufficient statistics ξ * = ξ ( θ ^ ) + e * . A parameter estimate θ ^ for each bootstrap sample is obtained by solving F θ ( θ , ξ * ) = 0 . By drawing a large number of bootstrap samples, the distribution of θ ^ with respect to misspecification error can be determined (see [33,34] for a similar approach).

3.4. Computing the Total Error

The variance in (30) is due to the imposed stochastic model on model errors. In addition, there exists a sampling error in the parameter estimate θ ^ that has been derived in Section 2. By adding the variance matrices computed in (16) (i.e., sampling error) and (30) (i.e., model error), we finally obtain the total variance matrix:
Var ( θ ^ ) = A 1 B V B + V A 1 .
Hence, the total error (TE) of the lth entry θ ^ l in θ ^ is determined by
TE ( θ ^ l ) = SE ( θ ^ l ) 2 + ME ( θ ^ l ) 2 .
To summarize, the steps described in the previous and this section resulted in the variance formula (32) that integrates sampling error and model error in a simultaneous inference without changing the estimation equation. The steps described here should be sufficient for the practical implementation of the proposed standard errors.
In the next section, we present illustrative examples of the computation of the model error component for the parameter estimate θ ^ .

4. Analytical Illustrative Examples

In this section, three illustrative examples are presented in which the variance due to model errors is quantified. In the examples, we assume infinite sample sizes. Specifically, sampling errors are ignored. Furthermore, we only consider ULS estimation. However, we believe that despite the simplified assumptions, the properties of the modeled specification error can be grasped more easily.

4.1. Example 1: Misspecified Error Structure in Unidimensional Factor Analysis

In Example 1, we consider a unidimensional CFA in a single group. We assume equal loadings of one and estimate the factor variance ϕ . The residual variances are allowed to vary across items. For I items X 1 , , X I , the covariance is defined as σ i j = Cov ( X i , X j ) . The assumed stochastic model for model errors is defined as
σ i j = ϕ + e σ , i j ,
where the error is decomposed into (see (19))
e σ , i j = u i + u j + v i j
with E ( u i ) = E ( v i j ) = 0 , Var ( u i ) = τ σ , 2 , and Var ( v i j ) = τ σ , 1 .
The estimating equation for ϕ in ULS estimation is given by
F θ ( θ ; ξ ) = i = 1 I 1 j = i + 1 I ( σ i j ϕ ) = 0 .
Hence, the second-order derivative is given by
F θ θ ( θ ; ξ ) = I ( I 1 ) 2 .
By inserting the data-generating model for model errors in (36), we obtain
0 = I ( I 1 ) 2 ( ϕ ^ ϕ ) + i = 1 I 1 j = i + 1 I e σ , i j
Solving this equation with respect to ϕ ^ , we arrive at
ϕ ^ = ϕ + 2 I i = 1 I u i + 2 I ( I 1 ) i = 1 I 1 j = i + 1 I v i j .
Hence, the variance in ϕ ^ due to model errors can be calculated as
Var ( ϕ ^ ) = 4 I τ σ , 2 + 2 I ( I 1 ) τ σ , 1 .

4.2. Example 2: Misspecified Error Structure in Confirmatory Factor Analysis

In Example 2, we consider a two-dimensional CFA. The items X 1 , , X I 1 load on the first factor, while items X I 1 + 1 , , X I 1 + I 2 load on the second factor. There are no modeled cross-loadings in the analysis model. However, by imposing a stochastic model for model errors, the effect of unmodeled cross-loadings is quantified in increased standard errors in the model parameter estimates.
The data-generating model for covariances σ i j = Cov ( X i , X j ) for i j is given by
σ i j = ϕ 11 + e σ , i j if i I 1 and j I 1 ϕ 12 + e σ , i j if i I 1 and j I 1 + 1 ϕ 22 + e σ , i j if i I 1 + 1 and j I 1 + 1
As in Example 1, we impose the stochastic model for misspecified covariances as
e σ , i j = u i + u j + v i j .
The estimating equation for the factor covariance ϕ 12 based on ULS is given by (see [8])
F θ ( θ ; ξ ) = i = 1 I 1 j = I 1 + 1 I 2 ( σ i j ϕ 12 ) = 0 .
Hence, we obtain for the estimate ϕ ^ 12
ϕ ^ 12 = ϕ 12 + 1 I 1 I 2 i = 1 I 1 j = I 1 + 1 I 2 e σ , i j .
Therefore, we obtain the variance for ϕ ^ 12 due to model misspecification
Var ( ϕ ^ 12 ) = 1 I 1 + 1 I 2 τ σ , 2 + 1 I 1 I 2 τ σ , 1 .
Interestingly, the variance in ϕ ^ 12 is determined by the smaller number of items per factor (i.e., min ( I 1 , I 2 ) ) if there exists variance in the random item factor (i.e., τ σ , 2 > 0 ).

4.3. Example 3: Measurement Noninvariance in Multiple-Group SEM

Finally, in Example 3, we quantify the extent of measurement noninvariance in increased standard errors in parameter estimates. It has been argued that misspecified SEM can result if violations of measurement invariance are intentionally ignored in model estimation [35,36,37,38].
We assume a multiple-group unidimensional CFA. We assume model deviations in the mean structure of group g that refer to measurement noninvariance
μ g , i = ν i + λ i α g + e μ , g , i .
The estimating equation for the group mean α g for group g is given by
F θ ( θ ; ξ ) = i = 1 I ( μ g , i ν i λ i α g ) = 0 ,
where parameters ν i and λ i are assumed to be already estimated for i = 1 , , I for ease of presentation. Then, we obtain the estimate α ^ g as
α ^ g = i = 1 I ( μ g , i ν i ) i = 1 I λ i = α g + i = 1 I e μ , g , i i = 1 I λ i .
Therefore, the variance of α ^ g can be computed as
Var ( α ^ g ) = I i = 1 I λ i 2 τ μ
If all loadings are set to one (i.e., λ i = 1 for all i = 1 , , I ), we obtain from (49)
Var ( α ^ g ) = 1 I τ μ
The quantity in (50) corresponds to the well-known linking error of the one-parameter logistic item response model [39,40,41,42]. Hence, the quantification of model misspecification can be seen as an alternative to assessing uncertainty in model parameters regarding the selected items. To some extent, one can argue that modeling model misspecification is conceptually equivalent to assessing linking errors, although linking errors are mainly considered in multiple-group settings.

5. Numerical Illustrative Example: ESS 2005 Data

5.1. Method

In this empirical example, we use a dataset that was also analyzed in [25,43,44,45]. The data came from the European Social Survey (ESS) conducted in the year 2005 (ESS 2005) that included subjects from 26 countries. The latent factor variable of tradition and conformity was assessed by four items presented in portrait format, where the scale of the items is such that a high value represents a low level of tradition conformity. The wording of the four items was as follows (see [45]): “It is important for him to be humble and modest. He tries not to draw attention to himself”. (item TR9); “Tradition is important to him. He tries to follow the customs handed down by his religion or family” (item TR20); “He believes that people should do what they’re told. He thinks people should follow rules at all times, even when no one is watching” (item CO7); and “It is important for him to always behave properly. He wants to avoid doing anything people would say is wrong” (item CO16). The full dataset used in [45] was downloaded from https://www.statmodel.com/Alignment.shtml (accessed on 9 May 2023).
In this application, we used ten selected countries C01, C05, C08, C10, C13, C15, C16, C17, C21, C25 using the country labels from [25]. This resulted in a subsample of N = 19,916 persons. The sample sizes per country ranged between 1450 and 2622, with an average of 1991.6 ( S D = 375.4 ). We only included participants in the sample that had no missing values on all four items.
A multiple-group one-dimensional factor model with 10 groups (i.e., 10 countries) was specified, assuming invariant item intercepts ν i and factor loadings λ i ( i = 1 , , 4 ) across countries. The residual variances were allowed to vary across countries. The CFA model was identified by fixing the factor mean of the first group to 0 and the factor variance in the first group to 1.
To compute standard errors with respect to the sampling of persons, nonparametric bootstrapping of persons was conducted. In total, R = 100 bootstrap samples were drawn. We used the nonparametric bootstrap samples to obtain bias-corrected estimates of the variance components for the stochastic model for misspecification errors (see Section 3.2). Misspecification error in parameter estimates was determined by parametric bootstrapping (see Section 3.3) using 200 bootstrap samples. The total error for all parameter estimates that comprised standard error and misspecification error was computed using Equation (33).
Because jackknifing items were suggested to investigate the stability in parameter estimates due to changes in the model [40,46,47], we computed the misspecification error with a jackknife-based variability measure. The dataset included four items such that a jackknife sample of items included three items. An alternative misspecification error based on jackknife (JKME) was obtained by applying the jackknife standard error formula [32].
The obtained country means and country standard deviations of the factor variable were linearly transformed for all different estimators such that the total population comprising all persons from all 10 countries had a mean of 0 and a standard deviation of 1. Hence, the factor variable was standardized in the total population that comprised all 10 countries. The multiple-group SEM was estimated with ULS.
The analysis was conducted using the sirt:::mgsem() function from the R [48] package sirt [49]. The dataset used in this analysis can be found at https://osf.io/hj3k9/ (accessed on 9 May 2023).

5.2. Results

The variance τ μ for residuals in the mean structure was estimated as 0.0332 with the raw estimation method. The bias-corrected estimate used for subsequent statistical inference was slightly smaller, with 0.0328. The estimate of τ σ , 2 was negative and set to 0. The raw estimate of τ σ , 1 was 0.0050, while the bias-corrected estimate was 0.0042. From these results, it can be concluded that misspecification was more severe in the mean structure than in the covariance structure.
In Table 1, parameter estimates and their standard errors, misspecification errors, and total errors are displayed. The misspecification errors (ME; estimated by parametric bootstrapping) were substantially larger than the standard error. The error ratio (ER) defined as the quotient of ME and SE was on average 4.69 ( S D = 1.15 ) and provided evidence that inferences regarding the factor means are much more affected by the choice of items than the sampling of persons. For factor means, the jackknife misspecification error (JKME) had an average of 0.220 and was very similar to the average of 0.214 of the ME values. Notably, the total error (TE) was mainly determined by the ME. Total errors were on average 480% larger than standard errors.
The situation was quite different for factor variances. The error ratio had an average of 1.14, indicating that sampling and modeling errors had a similar impact regarding the uncertainty in factor variances. Total errors were on average 52% larger than standard errors. Notably, the JKME ( M = 0.326 ) was much larger than the ME ( M = 0.130 ). We suppose that the standard error computation for jackknife does not reflect the stochastic model for residuals in covariances, which might explain this large difference.
In Table 2, factor means and factor variances and their error estimates are presented after standardizing the factor variable in the total population comprising all 10 countries. The ME estimates for factor means had an average of 0.121 ( S D = 0.010 ). While the estimates based on jackknifing items (i.e., JKME) had a similar average of 0.133, the variability across countries ( S D = 0.058 ) was much larger. This fact reflects the possibility that the extent of model misspecification error is allowed to vary across countries but is assumed as homogeneous across countries when estimating the ME. Moreover, note that the ME estimates for factor means in Table 2 when using the population standardization were much smaller than the factor means in the estimated model that used the first country as the reference (see Table 1). Setting scaling issues aside, this finding can be simply explained by the fact that is represented in the parameter estimate. When using population standardization, a country is compared with an average across countries. In the analysis model that uses a reference country, a respective country is compared with a reference country. In the latter approach, uncertainty within averages across countries is minor so that only the ME for one country is taken into account. However, in the former approach, the ME for a country reflects the misspecification of the corresponding country and the reference country because a comparison is conducted. Hence, the difference is in full alignment with what can be expected when using different identification constraints when estimating factor means and factor variances. In line with the results from Table 1, jackknife-based estimates of ME (i.e., JKME) were substantially larger than the ME estimates.

6. Discussion

In this article, we present a simultaneous statistical inference regarding sampling errors and model errors in single-group and multiple-group SEMs. Our framework closely follows that of Wu and Browne [9] but differs in the fact that we use the same estimation function (e.g., maximum likelihood or diagonally weighted least squares).
The procedure can be summarized as follows. Let ξ ^ contain estimated (group) means and (group) covariances. An SEM estimates a parameter θ ^ that summarizes the mean and covariance structure. Thus, we expect that the model-implied means and covariances approximate or predict the observed means and covariances somehow. We can write ξ ^ ξ ( θ ^ ) . In samples, θ ^ is an estimate of the population means and covariances ξ , and we can write ξ ^ = ξ + ε with a sampling error ε . Typically, the SEM will be misspecified. In other words, there exists some parameter θ 0 that fits the population means and covariances best with respect to a chosen fitting function F. Model specification error exists if there exists a vector of residuals e such that ξ = ξ ( θ 0 ) + e . The vector e is also referred to as a model error. Hence, we observe that there simultaneously appear sampling errors ε and model errors e , and the estimated means and covariances are represented as ξ ^ = ξ ( θ 0 ) + e + ε . In this paper, a stochastic model is imposed on model errors e that allows statistical inference for θ ^ , which is a function of the parameter θ 0 and the two errors e and ε . Ordinary statistical inference only reflects sampling error ε in standard errors in parameter estimates, while the proposed method additionally includes model errors in the standard errors.
Although the illustrations in Section 4 and the empirical example in Section 5 utilized ULS estimation, the derivations apply to any differentiable fitting function for SEM, such as the ML fitting function. In our stochastic model for the modeling of misspecification, we assume that there is no residual error in the diagonal matrix of residuals. Specifically, at the population level, the model-implied variances and the total variances of all items coincide. This is likely fulfilled with ULS estimation if residual variances have group-specific estimates resulting in zero residuals. However, even if the residual variances are group-specific, residuals for variances are typically different from zero in ML estimation. Hence, we suppose that the stochastic model for misspecification must be slightly adapted.
In this article, analytical illustrations and a numerical example are provided. In future studies, it would be interesting to investigate the performance of our approach in simulation studies. Nevertheless, we believe that the proposed method has clear asymptotic foundations. We suspect that the number of items by the number of groups is critical for the reliable estimation of the variance components of the stochastic model for model misspecification.
Because our approach only relies on modeling misspecification in the mean and the covariance structure, it can be directly applied to SEMs with ordinal data by substituting the mean vector with a vector of thresholds and the covariance matrix with a polychoric correlation matrix. In this case, a closer correspondence to linking errors that are mainly discussed in item response models can be investigated.
We would like to note that the simultaneous assessment of sampling errors and model specification errors has similarities to generalizability theory [50,51,52], domain sampling theory [53,54,55], or linking errors [56,57]. Notably, resampling techniques regarding items could also approximate statistical inference with respect to model misspecification [57].
In our approach, we opt for a particular stochastic model to model specification errors. There is always ambiguity in choosing such a stochastic model. For example, model errors in the same item or same item pair might be correlated across groups. Such an extension can also be addressed in our estimation approach with slight changes in the variance formula.
Our derivations show that the misspecification error reduces if the number of items increases. This result is a consequence of assuming independently distributed model errors across items. Sometimes, it might be more plausible to use a two-level model to model misspecification in the mean structure such that items are nested within item groups (or item clusters). In this case, model residuals in the mean structure might be positively correlated within an item group.
In general, one could argue that the stochastic model for model misspecification is of no relevance because it does not refer to an actual sampling model of items or the effects of model discrepancies in data generation (or data collection). We do not believe that this would be a viable objection. A statistical model is always a model in which an investigator defines randomness by the means of random variables, which must not have any connection to a sampling procedure. The model residuals are merely modeled by a random variable and the variability is quantified by a single variance for the mean structure and two variances for the covariance structure. Independence assumptions of residuals should be compared with random sampling assumptions across persons. In a concrete sample, there is no test to determine whether the independence assumption across persons is fulfilled. It is simply a definition that can be useful in applications for statistical inference. The same holds true for model residuals. They are simply assumed to be independent according to a stochastic model. This assumption cannot (be fully) tested.
If model misspecification is present, one can speculate as to whether ML estimation should be the preferred estimation method. ML achieves the most efficient parameter estimation if the analysis model is correctly specified. However, ML can produce more variable estimates for DWLS or ULS estimation in the presence of model misspecification. Hence, choosing between ML and DWLS (or ULS) is a decision regarding whether input data should be more reflected regarding sampling errors (i.e., preferring ML) or model residuals (i.e., preferring DWLS or ULS). We tend to prefer DWLS in many, if not almost all, applications because correct model specification is generally not guaranteed.
There can always be arguments that researchers should not interpret parameter estimates from misspecified models [58]. However, we disagree with such a view. With a data-driven modification of a target model of interest, the meaning of the primary model parameters changes. Hence, researchers implicitly change the meaning of the latent variables and their relationships in an SEM. We do not see why statistics (or psychometrics as a special branch of it) should redefine the target estimand of interest in a data-driven way. In contrast, researchers intentionally use models because they describe some phenomenon of interest. In our view, model misspecification should not lead to model modification but should be reflected as a type of error that can be reported. We believe that including model misspecification as an increase in standard errors in parameter estimates is a viable concept to quantify model errors. We hope that it can be applied to standard research practices that utilize SEMs.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The dataset used in Section 5 can be found at https://osf.io/hj3k9/ (accessed on 9 May 2023).

Conflicts of Interest

The author declares no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CFAconfirmatory factor analysis
DWLSdiagonally weighted least squares
ESSEuropean Social Survey
MEmisspecification error
MLmaximum likelihood
MVNmultivariate normal
SEstandard error
SEMstructural equation model
TEtotal error
WLSweighted least squares

Appendix A. Motivation of the Stochastic Model (19) for Model Misspecification in Covariances

We now present the motivation for the stochastic model in (19). Assume that the data-generating model is a one-dimensional factor model
X i = ν i + λ i η 1 + ε i
for items i = 1 , , I . The factor variable η 1 has a mean of 0 and a variance of 1. Moreover, there exist residual covariances ψ i j between items i and j. The analysis model assumes equal factor loadings, and residual covariances are unmodeled. We assume that factor loadings can be decomposed into
λ i = λ 0 + ω i ,
where loading residuals ω i have zero means (i.e., the average loading of I items is λ 0 ).
The covariance between observed variables X i and X j is given as
Cov ( X i , X j ) = λ i λ j = λ 0 2 + λ 0 ω i + λ 0 ω j + ω i ω j + ψ i j .
The model-implied covariance σ i j ( θ ) is λ 0 2 . Hence, the residuals in covariances are computed as
e σ , i j = λ 0 ω i + λ 0 ω j + ω i ω j + ψ i j .
By defining u i = λ 0 ω i , v i j = ω i ω j + ψ i j , we obtain the same stochastic model as in (19)
e σ , i j = u i + u j + v i j .
The independence of variables u i is assured if the residual loadings ω i are assumed to be independent and E ( ω i 2 ) = κ ω . Now, we additionally assume that ω i is normally distributed. Furthermore, we obtain for i j and k h due to E ( ω i ω j ) = 0 the covariance
Cov ( ω i ω j , ω k ω h ) = 0 if card { i , j } { k , h } = 0 0 if card { i , j } { k , h } = 1 κ ω 2 if card { i , j } { k , h } = 2 .
Due to (A6), the variables v i j in (A5) are uncorrelated.
The stochastic model might be made more complicated if the factor model involves multiple latent variables. Assume that each item i loads on a dimension d [ i ] . Assume that products ω i ω j are approximately zero. Furthermore, let ϕ d e be the covariance between latent factors η d and η e . Then, a modified stochastic model of (A5) for d [ i ] d [ j ] is given as
e σ , i j = ϕ d [ i ] d [ j ] u i + ϕ d [ i ] d [ j ] u j + v i j .
The covariances ϕ d e can be estimated when fitting the CFA model. Hence, the variances of u i and v i j in (A7) can be estimated as a cross-classified two-level model with random slopes.

References

  1. Bartholomew, D.J.; Knott, M.; Moustaki, I. Latent Variable Models and Factor Analysis: A Unified Approach; Wiley: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
  2. Bollen, K.A. Structural Equations with Latent Variables; Wiley: New York, NY, USA, 1989. [Google Scholar] [CrossRef]
  3. Browne, M.W.; Arminger, G. Specification and Estimation of Mean-and Covariance-Structure Models. In Handbook of Statistical Modeling for the Social and Behavioral Sciences; Arminger, G., Clogg, C.C., Sobel, M.E., Eds.; Springer: Boston, MA, USA, 1995; pp. 185–249. [Google Scholar] [CrossRef]
  4. Jöreskog, K.G.; Olsson, U.H.; Wallentin, F.Y. Multivariate Analysis with LISREL; Springer: Basel, Switzerland, 2016. [Google Scholar] [CrossRef]
  5. Mulaik, S.A. Foundations of Factor Analysis; CRC Press: Boca Raton, FL, USA, 2009. [Google Scholar]
  6. Shapiro, A. Statistical Inference of Covariance Structures. In Current Topics in the Theory and Application of Latent Variable Models; Edwards, M.C., MacCallum, R.C., Eds.; Routledge: Abingdon-on-Thames, UK, 2012; pp. 222–240. [Google Scholar] [CrossRef]
  7. Yuan, K.H.; Bentler, P.M. Structural Equation Modeling. In Handbook of Statistics; Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; Volume 26, pp. 297–358. [Google Scholar] [CrossRef]
  8. Robitzsch, A. Comparing the robustness of the structural after measurement (SAM) approach to structural equation modeling (SEM) against local model misspecifications with alternative estimation approaches. Stats 2022, 5, 631–672. [Google Scholar] [CrossRef]
  9. Wu, H.; Browne, M.W. Quantifying adventitious error in a covariance structure as a random effect. Psychometrika 2015, 80, 571–600. [Google Scholar] [CrossRef] [Green Version]
  10. Wu, H. An Empirical Bayesian Approach to Misspecified Covariance Structures. Unpublished Thesis, Ohio State University, Columbus, OH, USA, 2010. Available online: https://bit.ly/3HGuLFT (accessed on 9 May 2023).
  11. Uanhoro, J.O. Modeling misspecification as a parameter in Bayesian structural equation models. Educ. Psychol. Meas. 2023. [Google Scholar] [CrossRef]
  12. Stefanski, L.A.; Boos, D.D. The calculus of M-estimation. Am. Stat. 2002, 56, 29–38. [Google Scholar] [CrossRef]
  13. Bollen, K.A.; Davis, W.R. Two rules of identification for structural equation models. Struct. Equ. Model. 2009, 16, 523–536. [Google Scholar] [CrossRef]
  14. Drton, M.; Foygel, R.; Sullivant, S. Global identifiability of linear structural equation models. Ann. Stat. 2011, 39, 865–886. [Google Scholar] [CrossRef] [Green Version]
  15. Meredith, W. Measurement invariance, factor analysis and factorial invariance. Psychometrika 1993, 58, 525–543. [Google Scholar] [CrossRef]
  16. Putnick, D.L.; Bornstein, M.H. Measurement invariance conventions and reporting: The state of the art and future directions for psychological research. Dev. Rev. 2016, 41, 71–90. [Google Scholar] [CrossRef] [Green Version]
  17. Boos, D.D.; Stefanski, L.A. Essential Statistical Inference; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
  18. Gourieroux, C.; Monfort, A.; Trognon, A. Pseudo maximum likelihood methods: Theory. Econometrica 1984, 52, 681–700. [Google Scholar] [CrossRef] [Green Version]
  19. Kolenikov, S. Biases of parameter estimates in misspecified structural equation models. Sociol. Methodol. 2011, 41, 119–157. [Google Scholar] [CrossRef]
  20. White, H. Maximum likelihood estimation of misspecified models. Econometrica 1982, 50, 1–25. [Google Scholar] [CrossRef]
  21. Browne, M.W. Generalized least squares estimators in the analysis of covariance structures. S. Afr. Stat. J. 1974, 8, 1–24. Available online: https://bit.ly/3yviejm (accessed on 9 May 2023). [CrossRef]
  22. Savalei, V. Understanding robust corrections in structural equation modeling. Struct. Equ. Model. 2014, 21, 149–160. [Google Scholar] [CrossRef]
  23. MacCallum, R.C.; Browne, M.W.; Cai, L. Factor Analysis Models as Approximations. In Factor Analysis at 100; Cudeck, R., MacCallum, R.C., Eds.; Lawrence Erlbaum: Hillsdale, NJ, USA, 2007; pp. 153–175. [Google Scholar] [CrossRef]
  24. Held, L.; Sabanés Bové, D. Applied Statistical Inference; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar] [CrossRef]
  25. Robitzsch, A. Model-robust estimation of multiple-group structural equation models. Algorithms 2023, 16, 210. [Google Scholar] [CrossRef]
  26. Ver Hoef, J.M. Who invented the delta method? Am. Stat. 2012, 66, 124–127. [Google Scholar] [CrossRef]
  27. Gelman, A.; Hill, J. Data Analysis Using Regression and Multilevel/Hierarchical Models; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar] [CrossRef]
  28. Boker, S.; Neale, M.; Maes, H.; Wilde, M.; Spiegel, M.; Brick, T.; Spies, J.; Estabrook, R.; Kenny, S.; Bates, T.; et al. OpenMx: An open source extended structural equation modeling framework. Psychometrika 2011, 76, 306–317. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  29. Fox, J. Teacher’s corner: Structural equation modeling with the sem package in R. Struct. Equ. Model. 2006, 13, 465–486. [Google Scholar] [CrossRef] [Green Version]
  30. Rosseel, Y. lavaan: An R package for structural equation modeling. J. Stat. Softw. 2012, 48, 1–36. [Google Scholar] [CrossRef] [Green Version]
  31. Searle, S.R.; Casella, G.; McCulloch, C.E. Variance Components; Wiley: New York, NY, USA, 1992. [Google Scholar] [CrossRef]
  32. Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; CRC Press: Boca Raton, FL, USA, 1994. [Google Scholar] [CrossRef]
  33. Chen, Y.; Li, C.; Xu, G. DIF statistical inference and detection without knowing anchoring items. arXiv 2021, arXiv:2110.11112. [Google Scholar] [CrossRef]
  34. Wang, W.; Liu, Y.; Liu, H. Testing differential item functioning without predefined anchor items using robust regression. J. Educ. Behav. Stat. 2022, 47, 666–692. [Google Scholar] [CrossRef]
  35. Funder, D.C.; Gardiner, G. MIsgivings about measurement invariance. PsyArXiv 2023. [Google Scholar] [CrossRef]
  36. Robitzsch, A. Estimation methods of the multiple-group one-dimensional factor model: Implied identification constraints in the violation of measurement invariance. Axioms 2022, 11, 119. [Google Scholar] [CrossRef]
  37. Robitzsch, A.; Lüdtke, O. Why full, partial, or approximate measurement invariance are not a prerequisite for meaningful and valid group comparisons. Struct. Equ. Model. 2023, 1–12. [Google Scholar] [CrossRef]
  38. Welzel, C.; Inglehart, R.F. Misconceptions of measurement equivalence: Time for a paradigm shift. Comp. Political Stud. 2016, 49, 1068–1094. [Google Scholar] [CrossRef]
  39. Monseur, C.; Berezner, A. The computation of equating errors in international surveys in education. J. Appl. Meas. 2007, 8, 323–335. [Google Scholar]
  40. Monseur, C.; Sibberns, H.; Hastedt, D. Linking errors in trend estimation for international surveys in education. IERI Monogr. Ser. 2008, 1, 113–122. [Google Scholar]
  41. Robitzsch, A.; Lüdtke, O. Linking errors in international large-scale assessments: Calculation of standard errors for trend estimation. Assess. Educ. 2019, 26, 444–465. [Google Scholar] [CrossRef]
  42. Robitzsch, A. Linking error in the 2PL model. J 2023, 6, 58–84. [Google Scholar] [CrossRef]
  43. Knoppen, D.; Saris, W. Do we have to combine values in the Schwartz’ human values scale? A comment on the Davidov studies. Surv. Res. Methods 2009, 3, 91–103. [Google Scholar] [CrossRef]
  44. Beierlein, C.; Davidov, E.; Schmidt, P.; Schwartz, S.H.; Rammstedt, B. Testing the discriminant validity of Schwartz’ portrait value questionnaire items—A replication and extension of Knoppen and Saris (2009). Surv. Res. Methods 2012, 6, 25–36. [Google Scholar] [CrossRef]
  45. Asparouhov, T.; Muthén, B. Multiple-group factor analysis alignment. Struct. Equ. Model. 2014, 21, 495–508. [Google Scholar] [CrossRef] [Green Version]
  46. Gifi, A. Nonlinear Multivariate Analysis; Wiley: New York, NY, USA, 1990. [Google Scholar]
  47. Oberski, D.L. Evaluating sensitivity of parameters of interest to measurement invariance in latent variable models. Polit. Anal. 2014, 22, 45–60. [Google Scholar] [CrossRef] [Green Version]
  48. R Core Team. R: A Language and Environment for Statistical Computing; The R Foundation for Statistical Computing: Vienna, Austria, 2023; Available online: https://www.R-project.org/ (accessed on 15 March 2023).
  49. Robitzsch, A. sirt: Supplementary Item Response Theory Models; The R Foundation for Statistical Computing: Vienna, Austria, 2023; R package version 3.13-162; Available online: https://github.com/alexanderrobitzsch/sirt (accessed on 9 May 2023).
  50. Brennan, R.L. Generalizabilty Theory; Springer: New York, NY, USA, 2001. [Google Scholar] [CrossRef]
  51. Cronbach, L.J.; Gleser, G.C.; Nanda, H.; Rajaratnam, N. The Dependability of Behavioral Measurements: Theory of Generalizability for Scores and Profiles; Wiley: New York, NY, USA, 1972. [Google Scholar]
  52. Husek, T.R.; Sirotnik, K. Item Sampling in Educational Research; CSEIP Occasional Report No. 2; University of California: Los Angeles, CA, USA, 1967; Available online: https://bit.ly/3k47t1s (accessed on 8 May 2023).
  53. Hunter, J.E. Probabilistic foundations for coefficients of generalizability. Psychometrika 1968, 33, 1–18. [Google Scholar] [CrossRef] [PubMed]
  54. McDonald, R.P. Generalizability in factorable domains: “Domain validity and generalizability”. Educ. Psychol. Meas. 1978, 38, 75–79. [Google Scholar] [CrossRef]
  55. McDonald, R.P. Behavior domains in theory and in practice. Alta. J. Educ. Res. 2003, 49, 212–230. [Google Scholar]
  56. Robitzsch, A. Lp loss functions in invariance alignment and Haberman linking with few or many groups. Stats 2020, 3, 246–283. [Google Scholar] [CrossRef]
  57. Robitzsch, A. Robust and nonrobust linking of two groups for the Rasch model with balanced and unbalanced random DIF: A comparative simulation study and the simultaneous assessment of standard errors and linking errors with resampling techniques. Symmetry 2021, 13, 2198. [Google Scholar] [CrossRef]
  58. Steyer, R.; Sengewald, E.; Hahn, S. Some comments on Wu and Browne. Psychometrika 2015, 80, 608–610. [Google Scholar] [CrossRef]
Table 1. Empirical example: estimated model parameters with their standard errors, misspecification errors, and total errors.
Table 1. Empirical example: estimated model parameters with their standard errors, misspecification errors, and total errors.
ParEstSEMEJKMETE
ν 1 3.0700.0210.097-0.100
ν 2 2.6980.0170.094-0.095
ν 3 2.6020.0220.126-0.127
ν 4 2.6780.0190.101-0.103
λ 1 0.5910.0210.029-0.036
λ 2 0.5670.0210.029-0.036
λ 3 0.6850.0240.032-0.040
λ 4 0.5560.0180.029-0.034
α 1 0 ----
α 2 0.0620.0460.2260.3290.231
α 3 0.1860.0610.1870.1540.197
α 4 0.3260.0520.2050.3390.212
α 5 −0.4210.0310.2200.2060.223
α 6 −0.1090.0550.2140.1880.221
α 7 −0.0120.0490.2150.1410.220
α 8 −0.5040.0420.2390.2360.243
α 9 0.2320.0410.2060.1740.210
α 10 −0.5440.0500.2130.2110.218
ϕ 1 1 ----
ϕ 2 1.3290.1410.1350.3330.195
ϕ 3 1.1320.0970.1180.1830.153
ϕ 4 1.5340.1490.1520.1900.213
ϕ 5 1.3630.1000.1230.3070.158
ϕ 6 1.4230.1000.1310.4770.164
ϕ 7 2.1640.1720.1540.4340.231
ϕ 8 1.3700.1160.1350.2840.178
ϕ 9 1.1420.0860.1050.3080.136
ϕ 10 1.1480.0920.1160.4200.148
Note. Par = model parameter; Est = parameter estimate; SE = standard error; ME = misspecification error; JKME = misspecification error estimated by jackknifing items; TE = total error based on (33); ν i = item intercept; λ i = factor loading ( i = 1 , , 4 ); α g = factor mean; ϕ g = factor variance ( g = 1 , , 10 ); = factor mean α 1 and factor variance ϕ 1 for first country were fixed to 0 and 1, respectively.
Table 2. Empricial example: estimated factor means and factor variances after population standardization (i.e., mean of 0 and standard deviation of 1 in the total population) with their standard errors, misspecification errors, and total errors.
Table 2. Empricial example: estimated factor means and factor variances after population standardization (i.e., mean of 0 and standard deviation of 1 in the total population) with their standard errors, misspecification errors, and total errors.
CountryEstSEMEJKMETE
Factor Means
10.0650.0240.1240.1020.127
20.1170.0230.1340.1660.136
30.2200.0300.1130.0740.117
40.3360.0290.1160.1560.119
5−0.2850.0250.1060.2150.109
6−0.0260.0310.1180.2090.122
70.0560.0300.1100.0260.114
8−0.3540.0250.1270.1350.130
90.2580.0220.1350.1360.137
10−0.3870.0230.1300.1080.132
Factor Variances
10.8310.0270.0320.0890.042
20.9580.0240.0240.0340.034
30.8840.0220.0270.0440.035
41.0290.0240.0270.1060.036
50.9700.0190.0230.0330.030
60.9910.0270.0280.0720.039
71.2230.0190.0230.0150.029
80.9730.0290.0270.0490.039
90.8880.0170.0300.0460.035
100.8900.0200.0310.0810.037
Note. Est = parameter estimate; SE = standard error; ME = misspecification error; JKME = misspecification error estimated by jackknifing items; TE = total error based on (33).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Robitzsch, A. Modeling Model Misspecification in Structural Equation Models. Stats 2023, 6, 689-705. https://doi.org/10.3390/stats6020044

AMA Style

Robitzsch A. Modeling Model Misspecification in Structural Equation Models. Stats. 2023; 6(2):689-705. https://doi.org/10.3390/stats6020044

Chicago/Turabian Style

Robitzsch, Alexander. 2023. "Modeling Model Misspecification in Structural Equation Models" Stats 6, no. 2: 689-705. https://doi.org/10.3390/stats6020044

Article Metrics

Back to TopTop