Comparing the Robustness of the Structural after Measurement (SAM) Approach to Structural Equation Modeling (SEM) against Local Model Misspecifications with Alternative Estimation Approaches

Robitzsch, Alexander

doi:10.3390/stats5030039

Open AccessArticle

Comparing the Robustness of the Structural after Measurement (SAM) Approach to Structural Equation Modeling (SEM) against Local Model Misspecifications with Alternative Estimation Approaches

by

Alexander Robitzsch

^1,2

¹

IPN—Leibniz Institute for Science and Mathematics Education, Olshausenstraße 62, 24118 Kiel, Germany

²

Centre for International Student Assessment (ZIB), Olshausenstraße 62, 24118 Kiel, Germany

Stats 2022, 5(3), 631-672; https://doi.org/10.3390/stats5030039

Submission received: 22 June 2022 / Revised: 13 July 2022 / Accepted: 19 July 2022 / Published: 22 July 2022

(This article belongs to the Special Issue Robust Statistics in Action)

Download

Browse Figures

Versions Notes

Abstract

:

Structural equation models (SEM), or confirmatory factor analysis as a special case, contain model parameters at the measurement part and the structural part. In most social-science SEM applications, all parameters are simultaneously estimated in a one-step approach (e.g., with maximum likelihood estimation). In a recent article, Rosseel and Loh (2022, Psychol. Methods) proposed a two-step structural after measurement (SAM) approach to SEM that estimates the parameters of the measurement model in the first step and the parameters of the structural model in the second step. Rosseel and Loh claimed that SAM is more robust to local model misspecifications (i.e., cross loadings and residual correlations) than one-step maximum likelihood estimation. In this article, it is demonstrated with analytical derivations and simulation studies that SAM is generally not more robust to misspecifications than one-step estimation approaches. Alternative estimation methods are proposed that provide more robustness to misspecifications. SAM suffers from finite-sample bias that depends on the size of factor reliability and factor correlations. A bootstrap-bias-corrected LSAM estimate provides less biased estimates in finite samples. Nevertheless, we argue in the discussion section that applied researchers should nevertheless adopt SAM because robustness to local misspecifications is an irrelevant property when applying SAM. Parameter estimates in a structural model are of interest because intentionally misspecified SEMs frequently offer clearly interpretable factors. In contrast, SEMs with some empirically driven model modifications will result in biased estimates of the structural parameters because the meaning of factors is unintentionally changed.

Keywords:

structural equation modeling; model misspecification; structural after measurement approach; robust loss function

1. Introduction

Confirmatory factor analysis (CFA) and structural equation models (SEM) are one of the most important statistical tools for analyzing multivariate data (i.e., items) in the social sciences [1,2,3,4,5,6,7,8,9,10]. These models relate a multivariate vector

X = (X_{1}, \dots, X_{K})

of observed variables to a vector of latent variables (i.e., factors)

F

of lower dimension using a linear model

X = μ + Λ F + E,

(1)

where

μ

is the vector of expected values of

X

,

Λ

is a loading matrix, and

E

is a vector of residual variables. We assume

E (F) = 0

and

Φ = Var (F)

is the covariance matrix of the factor variables. Moreover,

Ψ = Var (E)

denotes the matrix of residual covariances. If

Σ = Var (X)

denotes the observed covariance matrix of the vector

X

, Equation (1) can be written as

Σ = Λ Φ Λ^{⊤} + Ψ .

(2)

Some identification constraints must be imposed to estimate the covariance structure model (2). Frequently, the factor loading matrix

Λ

follows a simple structure; that is, every item

X_{i}

only loads on one factor, while all other loadings equal zero. Moreover,

Ψ

is often assumed to be a diagonal matrix. Maximum likelihood (ML) and unweighted least squares (ULS) are most frequently used for estimating factor model (2) (see [3,6]).

In SEMs, relationships between latent variables and constraints in the covariance can be modeled. Hence, the covariance matrix

Φ = Φ (ξ)

is a function of a parameter

ξ

that contains regression coefficients and residual variances and covariances of latent variables. Of particular interest are non-saturated models that genuinely constrain the covariance matrix. The model

Φ (ξ)

is also referred to as the structural model, while (1) refers to the measurement model for all observed variables

X

.

In this article, we assume that all random variables

F

,

E

(and, hence,

X

) in Equation (1) are multivariate normally distributed. ML and ULS estimation methods will typically remain consistent for non-normally distributed variables [11,12]. There is literature investigating the robustness of SEM estimation to non-normal distributions [13,14,15]. This kind of robustness that relies on contaminated distributions [16,17,18,19] is not the target of this article. Instead, we rely on the concept of model robustness; that is, how robust is an SEM estimation method in the presence of local model misspecifications.

In SEMs, parameters of the measurement models (i.e.,

Λ

and

Ψ

) and the structural model (i.e.,

Φ (ξ)

) are most often simultaneously estimated with ML or ULS. It has been argued in the literature that local model misspecifications can impact the estimates of the structural model. In this article, we focus on unmodelled cross loadings (i.e., elements in

Λ

that are fixed to zero but are nonzero in the data-generating model) and unmodelled residual correlations (i.e., elements in

Ψ

that are fixed to zero but are nonzero in the data-generating model). In a recently published article by Rosseel and Loh (2022, Psychol. Methods, forthcoming, [20]; see the OSF page https://osf.io/pekbm/ for preprint versions, accessed on 22 June 2022), a two-step estimation method, structural after measurement (SAM), was proposed whose “estimates are more robust against local model misspecifications” [20] compared to ML or ULS estimation. The main idea is that parameters in the measurement model are estimated in the first step. The measurement models are typically estimated for each of the factors. SAM can be used if cross loadings and residual correlations are intentionally omitted in the first step of fitting single measurement models. The second step of the SAM approach consists of estimating the structural model. Global SAM (GSAM) uses the estimated parameters of the measurement model and fixes them in the second step when estimating

Φ (ξ)

. In contrast, local SEM (LSAM) constructs an estimate

{\hat{Φ}}_{LSAM}

based on the measurement model parameters from the first step and computes the parameter

ξ

by modeling

Φ (ξ)

to the input data

{\hat{Φ}}_{LSAM}

.

In this article, we compare the two-step SAM estimation approach with the one-step estimation methods ML and ULS. In particular, whether SAM is really more robust to local model misspecifications compared to ML and ULS. Moreover, we also compare these approaches with estimation methods that rely on robust loss functions. The rest of the article is structured as follows. In Section 2, the estimation of SEMs and the two-dimensional CFA model under local model misspecifications is analytically treated. Section 3 introduces alternative model-robust estimation methods that will provide additional robustness to misspecifications. Section 4 includes six simulation studies that compare the performance of different estimators in the CFA model under local misspecifications. Finally, Section 5 closes with a discussion.

2. Estimation under Local Model Misspecification

In this section, the consequences of local model misspecifications in a CFA model [21,22,23,24,25,26,27,28] are discussed. In more detail, consequences of unmodelled cross loadings and residual correlations are investigated. The treatment mainly focuses on ULS estimation. However, due to the similarity of ULS and ML estimation, it is likely that our findings also generalize to ML.

SEMs, and CFA models as a particular case, parametrize an observed covariance matrix

S

as a model-implied covariance matrix

Σ

with fewer model parameters. Let

s

include the vectorized elements of

S

and

γ

be the vector of model parameters. The vectorized elements of the model-implied covariance matrix

Σ = Σ (γ)

are denoted by

σ

. The model parameters

γ

are estimated using an estimating function

H

that implicitly defines a function

γ = g (s)

by solving the following equation with respect to

γ

:

H (s, γ) = 0 .

(3)

Note that the estimating Equation (3) can refer to ULS, ML, or any other estimation approach. The estimation approach in Equation (3) is also referred to as M estimation [29,30,31,32,33] or quasi-likelihood estimation [34,35]. In contrast to ML estimation, the estimating function (3) does correspond to a correctly specified model. In contrast, by choosing an estimating function

H

, a functional of the distribution (i.e., the data) and its sufficient statistic

s

in the case of the multivariate normal distribution is defined as the target estimand of interest. In this sense, the estimation approach can be regarded as model-agnostic [36].

Assume that there exists a true parameter without local model misspecifications. That is, there exists a

γ_{0}

such that

σ_{0} = σ (γ_{0})

. In this case, the model-implied covariance matrix of model parameters

γ_{0}

exactly reproduces the observed covariance matrix

s

at the population level because there is no model error. Formally, in the absence of model misspecifications, it holds that

H (σ_{0}, γ_{0}) = 0 .

(4)

We now apply a Taylor expansion to (3) (see [23,37,38]) and arrive at

0 = H (σ, γ) = H (σ_{0}, γ_{0}) + (\frac{\partial H}{\partial σ}) (σ - σ_{0}) + (\frac{\partial H}{\partial γ}) (γ - γ_{0}) .

(5)

By taking (4) into account, we obtain from (5)

γ - γ_{0} = - {(\frac{\partial H}{\partial γ})}^{- 1} (\frac{\partial H}{\partial σ}) (s - σ_{0}) .

(6)

Equation (6) is well-known as the implicit function theorem for obtaining a derivative of an implicitly defined function g. It characterizes the bias of the estimate

γ

if observed covariances

s

deviate from a true covariance matrix

σ_{0}

in the ideal situation in the absence of model misspecifications.

2.1. Numerical Analysis of Sensitivity to Model Misspecifications

The stability of an estimate can also be numerically studied. The main idea is to obtai insight into the consequences of Equation (6). Suppose that model error (i.e., model misspecification) is introduced by some parameter

δ

, and the implied covariance matrix is given by

s = σ (γ_{0}, δ)

. It is of interest how the bias

γ - γ_{0}

is affected by the presence of model misspecification. If an explicit estimating function g would be known such that

γ = g (s) = g (γ_{0}, δ)

, one can approximate this function by

γ - γ_{0} = (\frac{\partial g}{\partial δ}) δ .

(7)

If

e_{i}

denotes the i-th unit vector, one can determine the estimate

γ_{i}

by computing

g (\begin{matrix} γ_{0} \end{matrix}, Δ_{i} \begin{matrix} e_{i} \end{matrix})

(see [39]). This operation means that only the i-th component of the misspecification vector is modified. The parameter change can then be computed as

\frac{γ_{i} - γ_{0}}{Δ_{i}} = \frac{g ({\begin{matrix} γ \end{matrix}}_{0}, Δ_{i} \begin{matrix} e_{i} \end{matrix}) - \begin{matrix} g (γ_{0}, 0) \end{matrix}}{Δ_{i}} .

(8)

The vector of difference quotients in (8) approximate partial derivatives

\begin{matrix} (\partial g / \partial δ) \end{matrix}

in Equation (7). They also provide insights into how parameters would change by introducing model misspecifications. Note that this approach assumes a linearization of the estimation function. Moreover, the increment

Δ_{i}

must be carefully chosen to obtain generalizable results.

2.2. Two-Factor CFA Model with Local Model Misspecifications

In the rest of the paper, we focus on the two-dimensional CFA. It is assumed that three items

X_{i}

measure a factor

F_{X}

and three items

Y_{i}

measure a factor

F_{Y}

(

i = 1, 2, 3

). The model matrices of the estimated CFA model are given by

Λ = (\begin{matrix} λ_{X_{1}} & 0 \\ λ_{X_{2}} & 0 \\ λ_{X_{3}} & 0 \\ 0 & λ_{Y_{1}} \\ 0 & λ_{Y_{2}} \\ 0 & λ_{Y_{3}} \end{matrix}), Φ = (\begin{matrix} 1.0 \\ ϕ & 1 \end{matrix}) and Ψ = (\begin{matrix} ψ_{X_{1}} \\ 0 & ψ_{X_{2}} \\ 0 & 0 & ψ_{X_{3}} \\ 0 & 0 & 0 & ψ_{Y_{1}} \\ 0 & 0 & 0 & 0 & ψ_{Y_{2}} \\ 0 & 0 & 0 & 0 & 0 & ψ_{Y_{3}} \end{matrix}) .

(9)

The correlation

ϕ

between the two latent factors is of primary interest. However, we study the estimation of the two-dimensional CFA model (9) under local model misspecifications. In this case, we denote by

λ_{X, 0}

and

λ_{Y, 0}

the factor loadings of

X

and

Y

in (9) without any misspecification. Moreover,

ϕ_{0}

is the true factor correlation,

Φ_{0}

the corresponding correlation matrix,

Λ_{0}

the factor loading matrix with simple structure and

Ψ_{0}

the diagonal matrix of variances of residual errors. The analysis model (9) is graphically displayed in Figure 1.

We induce model misspecification (9) by cross loadings

δ_{X} = (δ_{X_{1}}, δ_{X_{2}}, δ_{X_{3}})

and

δ_{Y} = (δ_{Y_{1}}, δ_{Y_{2}}, δ_{Y_{3}})

as well as residual error correlations

ψ_{X_{i} Y_{j}}

. However, we assume that there are no residual error correlations between items in the same measurement model (i.e.,

ψ_{X_{i} X_{j}} = 0

and

ψ_{Y_{i} Y_{j}} = 0

for

i \neq j

). All residual error correlations

ψ_{X_{i} Y_{j}}

are collected in the vector

ψ

. The data-generating model under local model misspecification can be generally written as

Λ = (\begin{matrix} λ_{X_{1}} & δ_{X_{1}} \\ λ_{X_{2}} & δ_{X_{2}} \\ λ_{X_{3}} & δ_{X_{3}} \\ δ_{Y_{1}} & λ_{Y_{1}} \\ δ_{Y_{2}} & λ_{Y_{2}} \\ δ_{Y_{3}} & λ_{Y_{3}} \end{matrix}), Φ = (\begin{matrix} 1.0 \\ ϕ & 1 \end{matrix}) and Ψ = (\begin{matrix} ψ_{X_{1}} \\ 0 & ψ_{X_{2}} \\ 0 & 0 & ψ_{X_{3}} \\ ψ_{X_{1} Y_{1}} & ψ_{X_{1} Y_{2}} & ψ_{X_{1} Y_{3}} & ψ_{Y_{1}} \\ ψ_{X_{2} Y_{1}} & ψ_{X_{2} Y_{2}} & ψ_{X_{2} Y_{3}} & 0 & ψ_{Y_{2}} \\ ψ_{X_{3} Y_{1}} & ψ_{X_{3} Y_{2}} & ψ_{X_{3} Y_{3}} & 0 & 0 & ψ_{Y_{3}} \end{matrix}) .

(10)

The data-generating model (10) is graphically displayed in Figure 2.

In the following sections, we discuss the impact of local model misspecifications (i.e., cross loadings and residual error correlations) on parameter estimates.

2.3. Nonrobustness of the LSAM Approach

We now study the estimated factor correlation matrix in the LSAM approach at the population level. The factor loadings and measurement error variances are independently estimated for the two factors. Note that measurement model parameters are identified with only three items. A simple-structure loading matrix

{\hat{Λ}}_{LSAM}

and the diagonal matrix of estimated error variances

{\hat{Ψ}}_{SAM}

are defined, which contain the parameters of the two measurement models.

The LSAM approach employs a mapping matrix

M_{LSAM}

[20] that fulfills the property

M_{LSAM} {\hat{Λ}}_{LSAM} = I

, where

I

is the identity matrix. Rosseel and Loh [20] proposed

M_{LSAM} = {({\hat{Λ}}_{LSAM}^{⊤} V {\hat{Λ}}_{LSAM})}^{- 1} {\hat{Λ}}_{LSAM}^{⊤} V

(11)

with a positive definite matrix

V

. LSAM-ULS estimation results with the choice

V = I

. In the following, we only study the case

V = I

for ease of presentation.

The factor correlation matrix

{\hat{Φ}}_{LSAM}

in LSAM is estimated by

{\hat{Φ}}_{LSAM} = M_{LSAM} (S - {\hat{Ψ}}_{SAM}) M_{LSAM}^{⊤} .

(12)

If we assume data at the population level, the covariance matrix is given by

S = Λ Φ_{0} Λ^{⊤} + Ψ

(13)

with model matrices defined in (10).

We can now derive the estimated factor correction in LSAM at the population level:

\begin{matrix} {\hat{Φ}}_{LSAM} & = & M_{LSAM} (Λ Φ_{0} Λ^{⊤} + Ψ - {\hat{Ψ}}_{SAM}) M_{LSAM}^{⊤} \\ = & M_{LSAM} ({\hat{Λ}}_{LSAM} Φ_{0} {\hat{Λ}}_{LSAM}^{⊤} - {\hat{Λ}}_{LSAM} Φ_{0} {\hat{Λ}}_{LSAM}^{⊤} + Λ Φ_{0} Λ^{⊤} + Ψ - {\hat{Ψ}}_{SAM}) M_{LSAM}^{⊤} \\ = & Φ_{0} + M_{LSAM} (Λ Φ_{0} Λ^{⊤} - {\hat{Λ}}_{LSAM} Φ_{0} {\hat{Λ}}_{LSAM}^{⊤} + Ψ - {\hat{Ψ}}_{SAM}) M_{LSAM}^{⊤} \\ = & Φ_{0} + M_{LSAM} (ℬ_{1} + ℬ_{2}) M_{LSAM}^{⊤} \end{matrix}

(14)

It can be seen that (14) contains two bias components. The matrix

M_{LSAM}

serves as a scaling matrix of the two bias terms. The first bias component

ℬ_{1} = Λ Φ_{0} Λ^{⊤} - {\hat{Λ}}_{LSAM} Φ_{0} {\hat{Λ}}_{LSAM}^{⊤}

quantifies the discrepancy between the true loading matrix

Λ

under misspecification and the estimated simple-structure loading matrix

{\hat{Λ}}_{LSAM}

. The second bias component

ℬ_{2} = Ψ - {\hat{Ψ}}_{SAM}

assesses unmodelled residual error correlations. Hence, Equation (14) illustrates that LSAM is not immune to unmodelled cross loadings and unmodelled residual error correlations. In the following, Section 2.4, we compare ULS and GSAM estimation under the locally misspecified two-dimensional CFA model.

2.4. Comparison of ULS and GSAM in a Congeneric Measurement Model

In this subsection, we compare ULS and GSAM estimation of the two-dimensional CFA model under local model misspecifications. The two parameter estimates of the factor correlation are compared. It is shown that the different estimation approaches will be typically different.

The ULS fitting function in the CFA model can be written as

H (λ_{X}, λ_{Y}, ϕ, s) = H_{X X} (λ_{X}, s) + H_{Y Y} (λ_{Y}, s) + H_{X Y} (λ_{X}, λ_{Y}, ϕ, s),

(15)

where we define

H_{X X} (λ_{X}, s) = \sum_{i, j | i \neq j}^{3} {(s_{X_{i} X_{j}} - λ_{X_{i}} λ_{X_{j}})}^{2}

(16)

H_{Y Y} (λ_{Y}, s) = \sum_{i, j | i \neq j}^{3} {(s_{Y_{i} Y_{j}} - λ_{Y_{i}} λ_{Y_{j}})}^{2} and

(17)

H_{X Y} (λ_{X}, λ_{Y}, ϕ, s) = \sum_{i = 1}^{3} \sum_{j = 1}^{3} {(s_{X_{i} Y_{j}} - λ_{X_{i}} λ_{Y_{j}} ϕ)}^{2} .

(18)

The ULS parameter estimates are obtained in a one-step estimation approach:

({\hat{λ}}_{X, ULS}, {\hat{λ}}_{Y, ULS}, {\hat{ϕ}}_{ULS}) = \underset{(λ_{X}, λ_{Y}, ϕ)}{arg min} H (λ_{X}, λ_{Y}, ϕ, s) .

(19)

In contrast, GSAM is a two-step approach. It first determines factor loadings

{\hat{λ}}_{X, SAM}

and

{\hat{λ}}_{Y, SAM}

in separate measurement models:

{\hat{λ}}_{X, SAM} = \underset{λ_{X}}{arg min} H_{X X} (λ_{X}, s) and

(20)

{\hat{λ}}_{Y, SAM} = \underset{λ_{Y}}{arg min} H_{Y Y} (λ_{Y}, s) .

(21)

In a second step, the factor correlation is estimated while fixing the factor loadings to the estimates obtained in (20) and (21):

{\hat{ϕ}}_{GSAM} = \underset{ϕ}{arg min} H_{X Y} ({\hat{λ}}_{X, ULS}, {\hat{λ}}_{Y, ULS}, ϕ, s) .

(22)

We can take the derivative of (15) to obtain estimating equations for the ULS approach for the model parameters

λ_{X}

,

λ_{Y}

and

ϕ

. We arrive at

H_{λ_{X}} (λ_{X}, s) = (\frac{\partial H}{\partial λ_{X}}) = 0, H_{λ_{Y}} (λ_{X}, s) = (\frac{\partial H}{\partial λ_{X}}) = 0 and H_{ϕ} (λ_{X}, λ_{Y}, ϕ, s) = (\frac{\partial H}{\partial λ_{X}}) = 0 .

(23)

For example, the estimating equation for

λ_{X}

can be written as

H_{λ_{X}} (λ_{X}, s) = H_{X X, λ_{X}} (λ_{X}, s) + H_{X Y, λ_{X}} (λ_{X}, λ_{Y}, ϕ, s) = 0,

(24)

where

H_{X X, λ_{X}}

denotes the vector of partial derivatives of

H_{X X}

with respect to

λ_{X}

. The other quantities are defined similarly. The estimating equations for

λ_{Y}

and

ϕ

are given by

H_{λ_{Y}} (λ_{Y}, s) = H_{Y Y, λ_{Y}} (λ_{Y}, s) + H_{X Y, λ_{Y}} (λ_{X}, λ_{Y}, ϕ, s) = 0 and

(25)

H_{ϕ} (λ_{X}, λ_{Y}, ϕ, s) = H_{X Y, ϕ} (λ_{X}, λ_{Y}, ϕ, s) = 0 .

(26)

In order to study the impact of model misspecification, we derive the estimates

{\hat{λ}}_{X, ULS}

,

{\hat{λ}}_{Y, ULS}

and

{\hat{ϕ}}_{ULS}

as a function of true model parameters

λ_{X, 0}

,

λ_{Y, 0}

and

ϕ_{0}

, and the parameters indicating model misspecifications; that is, cross loadings

δ_{X}

and

δ_{Y}

and residual correlations

ψ

. We use a linear Taylor expansion of (24)–(26) around true parameters. For the estimate

{\hat{ϕ}}_{ULS}

of

ϕ

, we rely on the Taylor expansion

\begin{matrix} 0 = H_{X Y, ϕ} (λ_{X}, λ_{Y}, ϕ, s) & = & H_{X Y, ϕ} (λ_{X, 0}, λ_{Y, 0}, ϕ_{0}, σ_{0}) + H_{X Y, ϕ ϕ, 0} ({\hat{ϕ}}_{ULS} - ϕ_{0}) \\ + H_{X Y, ϕ λ_{X}, 0} ({\hat{λ}}_{X, ULS} - λ_{X, 0}) + H_{X Y, ϕ λ_{Y}, 0} ({\hat{λ}}_{Y, ULS} - λ_{Y, 0}) \\ + H_{X Y, ϕ δ_{X}, 0} δ_{X} + H_{X Y, ϕ δ_{Y}, 0} δ_{Y} + H_{X Y, ϕ ψ, 0} ψ \end{matrix}

(27)

Note that

H_{X Y, ϕ} (λ_{X, 0}, λ_{Y, 0}, ϕ_{0}, σ_{0}) = 0

because

σ_{0}

is the model-implied covariance matrix for the correctly specified model that fixes the misspecification parameters to zero (i.e.,

δ_{X} = 0

,

δ_{Y} = 0

, and

ψ = 0

). Hence, we can rewrite (27) as

\begin{matrix} H_{X Y, ϕ ϕ, 0} ({\hat{ϕ}}_{ULS} - ϕ_{0}) & = & - H_{X Y, ϕ λ_{X}, 0} ({\hat{λ}}_{X, ULS} - λ_{X, 0}) - H_{X Y, ϕ λ_{Y}, 0} ({\hat{λ}}_{Y, ULS} - λ_{Y, 0}) \\ - H_{X Y, ϕ δ_{X}, 0} δ_{X} - H_{X Y, ϕ δ_{Y}, 0} δ_{Y} - H_{X Y, ϕ ψ, 0} ψ \end{matrix}

(28)

All appearing second-order derivatives are evaluated at the true parameters (hence, the usage of the subscript “0”). Equation (28) describes asymptotic biases in

{\hat{ϕ}}_{ULS}

due to biased estimates of loadings

{\hat{λ}}_{X, ULS}

and

{\hat{λ}}_{Y, ULS}

and the presence of model misspecifications

δ_{X}

,

δ_{Y}

and

ψ

. The idea of SAM would be that the first two terms in (28) vanish because loadings of the measurement model could be estimated with the least bias.

We now derive a general expression of

H_{X Y, ϕ ϕ, 0} ({\hat{ϕ}}_{ULS} - ϕ_{0})

for ULS estimating in order to compare it to the SAM approach. By simultaneously applying a Taylor expansion to estimated parameters, we obtain

A (\begin{matrix} {\hat{λ}}_{X, ULS} - λ_{X, 0} \\ {\hat{λ}}_{Y, ULS} - λ_{Y, 0} \\ {\hat{ϕ}}_{ULS} - ϕ_{0} \end{matrix}) = - B (\begin{matrix} δ_{X} \\ δ_{Y} \\ ψ \end{matrix})

(29)

The matrices

A

and

B

can be computed as

A = (\begin{matrix} H_{X X, λ_{X} λ_{X}, 0} + H_{X Y, λ_{X} λ_{X}, 0} & H_{X Y, λ_{X} λ_{Y}, 0} & H_{X Y, λ_{X} ϕ, 0} \\ H_{X Y, λ_{Y} λ_{X}, 0} & H_{Y Y, λ_{Y} λ_{Y}, 0} + H_{X Y, λ_{Y} λ_{Y}, 0} & H_{X Y, λ_{Y} ϕ, 0} \\ H_{X Y, ϕ λ_{X}, 0} & H_{X Y, ϕ λ_{Y}, 0} & H_{X Y, ϕ ϕ, 0} \end{matrix}) and

(30)

B = (\begin{matrix} H_{X X, λ_{X} δ_{X}, 0} + H_{X Y, λ_{X} δ_{X}, 0} & H_{X Y, λ_{X} δ_{Y}, 0} & H_{X Y, λ_{X} ψ, 0} \\ H_{X Y, λ_{Y} δ_{X}, 0} & H_{Y Y, λ_{Y} δ_{Y}, 0} + H_{X Y, λ_{Y} δ_{Y}, 0} & H_{X Y, λ_{Y} ψ, 0} \\ H_{X Y, ϕ δ_{X}, 0} & H_{X Y, ϕ δ_{Y}, 0} & H_{X Y, ϕ ψ, 0} \end{matrix}) .

(31)

The matrices can be easily computed in the two-factor model involving three indicators for both factors. We obtain, for the derivatives of the functions

H_{X X, λ_{X}}

and

H_{Y Y, λ_{Y}}

,

H_{X X, λ_{X} λ_{X}, 0} = - (\begin{matrix} λ_{X_{2}, 0}^{2} + λ_{X_{3}, 0}^{2} & λ_{X_{1}, 0} λ_{X_{2}, 0} & λ_{X_{1}, 0} λ_{X_{3}, 0} \\ λ_{X_{2}, 0} λ_{X_{1}, 0} & λ_{X_{1}, 0}^{2} + λ_{X_{3}, 0}^{2} & λ_{X_{2}, 0} λ_{X_{3}, 0} \\ λ_{X_{3}, 0} λ_{X_{1}, 0} & λ_{X_{3}, 0} λ_{X_{2}, 0} & λ_{X_{2}, 0}^{2} + λ_{X_{3}, 0}^{2} \end{matrix})

(32)

H_{X X, λ_{X} δ_{X}, 0} = - ϕ_{0} H_{X X, λ_{X} λ_{X}, 0}

(33)

H_{Y Y, λ_{Y} λ_{Y}, 0} = - (\begin{matrix} λ_{Y_{2}, 0}^{2} + λ_{Y_{3}, 0}^{2} & λ_{Y_{1}, 0} λ_{Y_{2}, 0} & λ_{Y_{1}, 0} λ_{Y_{3}, 0} \\ λ_{Y_{2}, 0} λ_{Y_{1}, 0} & λ_{Y_{1}, 0}^{2} + λ_{Y_{3}, 0}^{2} & λ_{Y_{2}, 0} λ_{Y_{3}, 0} \\ λ_{Y_{3}, 0} λ_{Y_{1}, 0} & λ_{Y_{3}, 0} λ_{Y_{2}, 0} & λ_{Y_{2}, 0}^{2} + λ_{Y_{3}, 0}^{2} \end{matrix})

(34)

H_{Y Y, λ_{Y} δ_{Y}, 0} = - ϕ_{0} H_{Y Y, λ_{Y} λ_{Y}, 0}

(35)

The partial derivatives for the functions involving the

H_{X Y}

terms are given by (using the notation

{∥ x ∥}_{2}^{2} = x_{1}^{2} + x_{2}^{2} + x_{3}^{2}

for a vector

x = (x_{1}, x_{2}, x_{3})

):

H_{X Y, λ_{X} λ_{X}, 0} = - {∥ λ_{X, 0} ∥}_{2}^{2} {ϕ_{0}}^{2} (\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{matrix}),

(36)

H_{X Y, λ_{X} λ_{Y}, 0} = ϕ_{0} (1 - 2 ϕ_{0}) (\begin{matrix} λ_{X_{1}, 0} λ_{Y_{1}, 0} & λ_{X_{1}, 0} λ_{Y_{2}, 0} & λ_{X_{1}, 0} λ_{Y_{3}, 0} \\ λ_{X_{2}, 0} λ_{Y_{1}, 0} & λ_{X_{2}, 0} λ_{Y_{2}, 0} & λ_{X_{2}, 0} λ_{Y_{3}, 0} \\ λ_{X_{3}, 0} λ_{Y_{1}, 0} & λ_{X_{3}, 0} λ_{Y_{2}, 0} & λ_{X_{3}, 0} λ_{Y_{3}, 0} \end{matrix}),

(37)

H_{X Y, λ_{X} ϕ, 0} = {∥ λ_{Y, 0} ∥}_{2}^{2} (1 - 2 ϕ_{0}) (\begin{matrix} λ_{X_{1}, 0} \\ λ_{X_{2}, 0} \\ λ_{X_{3}, 0} \end{matrix}),

(38)

H_{X Y, λ_{X} δ_{X}, 0} = - H_{X Y, λ_{X} λ_{X}, 0},

(39)

H_{X Y, λ_{X} δ_{Y}, 0} = - H_{X Y, λ_{X} λ_{Y}, 0} and

(40)

H_{X Y, λ_{X} ψ, 0} = ϕ_{0} (\begin{matrix} λ_{Y_{1}, 0} & λ_{Y_{2}, 0} & λ_{Y_{3}, 0} & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & λ_{Y_{1}, 0} & λ_{Y_{2}, 0} & λ_{Y_{3}, 0} & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & λ_{Y_{1}, 0} & λ_{Y_{2}, 0} & λ_{Y_{3}, 0} \end{matrix}),

(41)

where

ψ = (ψ_{X_{1}, Y_{1}}, ψ_{X_{1}, Y_{2}}, \dots, ψ_{X_{3}, Y_{2}}, ψ_{X_{3}, Y_{3}})

. The partial derivatives of

H_{X Y, λ_{Y}}

can be obtained by swapping X and Y in Equations (32) to (41).

Finally, the derivatives of

H_{X Y, ϕ}

can be computed as

H_{X Y, ϕ λ_{X}, 0} = {∥ λ_{Y, 0} ∥}_{2}^{2} (1 - 2 ϕ_{0}) (\begin{matrix} λ_{X_{1}, 0} & λ_{X_{2}, 0} & λ_{X_{3}, 0} \end{matrix}),

(42)

H_{X Y, ϕ λ_{Y}, 0} = {∥ λ_{X, 0} ∥}_{2}^{2} (1 - 2 ϕ_{0}) (\begin{matrix} λ_{Y_{1}, 0} & λ_{Y_{2}, 0} & λ_{Y_{3}, 0} \end{matrix}),

(43)

H_{X Y, ϕ ϕ, 0} = - ∥ λ_{X, 0} ∥_{2}^{2} {∥ λ_{Y, 0} ∥}_{2}^{2},

(44)

H_{X Y, ϕ δ_{X}, 0} = {∥ λ_{Y, 0} ∥}_{2}^{2} (\begin{matrix} λ_{X_{1}, 0} & λ_{X_{2}, 0} & λ_{X_{3}, 0} \end{matrix}),

(45)

H_{X Y, ϕ δ_{Y}, 0} = {∥ λ_{X, 0} ∥}_{2}^{2} (\begin{matrix} λ_{Y_{1}, 0} & λ_{Y_{2}, 0} & λ_{Y_{3}, 0} \end{matrix}) and

(46)

H_{X Y, ϕ ψ, 0} = (\begin{matrix} λ_{X_{1}, 0} λ_{Y_{1}, 0} & λ_{X_{1}, 0} λ_{Y_{2}, 0} & \dots & λ_{X_{1}, 0} λ_{Y_{2}, 0} \end{matrix}) .

(47)

The ULS and the SAM approaches can be compared based on the derivation in Equation (28). In both SAM approaches (i.e., LSAM and GSAM), the terms involving the model misspecifications

δ_{X}

,

δ_{Y}

, and

ψ

do not vanish. However, it could be argued that the SAM approaches would be more robust if the estimates

{\hat{λ}}_{X, SAM}

and

{\hat{λ}}_{Y, SAM}

were close to the true loadings

λ_{X, 0}

and

λ_{Y, 0}

, respectively.

At first, we study the estimates

{\hat{λ}}_{X, ULS}

and

{\hat{λ}}_{Y, ULS}

. We use the estimating Equation (29) and solve them for the loadings. The determining equations can be written in a general form as follows

B_{{\hat{λ}}_{X, ULS}, λ_{X}} ({\hat{λ}}_{X, ULS} - λ_{X, 0}) = - B_{{\hat{λ}}_{X, ULS}, δ_{X}} δ_{X} - B_{{\hat{λ}}_{X, ULS}, δ_{Y}} δ_{Y} - B_{{\hat{λ}}_{X, ULS}, ψ} ψ

(48)

B_{{\hat{λ}}_{Y, ULS}, λ_{Y}} ({\hat{λ}}_{Y, ULS} - λ_{Y, 0}) = - B_{{\hat{λ}}_{Y, ULS}, δ_{Y}} δ_{Y} - B_{{\hat{λ}}_{Y, ULS}, δ_{Y}} δ_{Y} - B_{{\hat{λ}}_{Y, ULS}, ψ} ψ

(49)

The matrix

B_{{\hat{λ}}_{X, ULS}, δ_{X}}

is given by

\begin{matrix} B_{{\hat{λ}}_{X, ULS}, δ_{X}} & = & H_{X X, λ_{X} δ_{X}, 0} + H_{X Y, λ_{X} δ_{X}, 0} - H_{X Y, λ_{X} ϕ, 0} H_{X Y, ϕ ϕ, 0}^{- 1} H_{X Y, ϕ δ_{X}, 0} \\ - (H_{X Y, λ_{X} λ_{Y}, 0} - H_{X Y, λ_{X} ϕ, 0} H_{X Y, ϕ ϕ, 0}^{- 1} H_{X Y, ϕ λ_{Y}, 0}) \\ \times {(H_{Y Y, λ_{Y} λ_{Y}, 0} + H_{X Y, λ_{Y} λ_{Y}, 0} - H_{X Y, λ_{Y} ϕ, 0} H_{X Y, ϕ ϕ, 0}^{- 1} H_{X Y, ϕ λ_{Y}, 0})}^{- 1} \\ \times (H_{X Y, λ_{Y} δ_{X}, 0} - H_{X Y, λ_{Y} ϕ, 0} H_{X Y, ϕ ϕ, 0}^{- 1} H_{X Y, ϕ δ_{X}, 0}) \end{matrix}

(50)

The symbol “×” in (50) denotes the matrix multiplication. The matrices

B_{{\hat{λ}}_{X, ULS}, λ_{X}}

,

B_{{\hat{λ}}_{X, ULS}, δ_{Y}}

and

B_{{\hat{λ}}_{X, ULS}, ψ}

are given as

\begin{matrix} B_{{\hat{λ}}_{X, ULS}, λ_{X}} & = & (H_{X X, λ_{X} λ_{X}, 0} + H_{X Y, λ_{X} λ_{X}, 0} - H_{X Y, λ_{X} ϕ, 0} H_{X Y, ϕ ϕ, 0}^{- 1} H_{X Y, ϕ λ_{X}, 0}) \\ - (H_{X Y, λ_{X} λ_{Y}, 0} - H_{X Y, λ_{X} ϕ, 0} H_{X Y, ϕ ϕ, 0}^{- 1} H_{X Y, ϕ λ_{Y}, 0}) \\ \times {(H_{Y Y, λ_{Y} λ_{Y}, 0} + H_{X Y, λ_{Y} λ_{Y}, 0} - H_{X Y, λ_{Y} ϕ, 0} H_{X Y, ϕ ϕ, 0}^{- 1} H_{X Y, ϕ λ_{Y}, 0})}^{- 1} \\ \times (H_{X Y, λ_{Y} λ_{X}, 0} - H_{X Y, λ_{Y} ϕ, 0} H_{X Y, ϕ ϕ, 0}^{- 1} H_{X Y, ϕ λ_{X}, 0}) \end{matrix}

(51)

\begin{matrix} B_{{\hat{λ}}_{X, ULS}, δ_{Y}} & = & (H_{X Y, λ_{X} δ_{Y}, 0} - H_{X Y, λ_{X} ϕ, 0} H_{X Y, ϕ ϕ, 0}^{- 1} H_{X Y, ϕ δ_{Y}, 0}) \\ - (H_{X Y, λ_{X} λ_{Y}, 0} - H_{X Y, λ_{X} ϕ, 0} H_{X Y, ϕ ϕ, 0}^{- 1} H_{X Y, ϕ λ_{Y}, 0}) \\ \times {(H_{Y Y, λ_{Y} λ_{Y}, 0} + H_{X Y, λ_{Y} λ_{Y}, 0} - H_{X Y, λ_{Y} ϕ, 0} H_{X Y, ϕ ϕ, 0}^{- 1} H_{X Y, ϕ λ_{Y}, 0})}^{- 1} \\ \times (H_{Y Y, λ_{Y} δ_{Y}, 0} + H_{X Y, λ_{Y} δ_{Y}, 0} - H_{X Y, λ_{Y} ϕ, 0} H_{X Y, ϕ ϕ, 0}^{- 1} H_{X Y, ϕ δ_{Y}, 0}) \end{matrix}

(52)

\begin{matrix} B_{{\hat{λ}}_{X, ULS}, ψ} & = & (H_{X Y, λ_{X} ψ, 0} - H_{X Y, λ_{X} ϕ, 0} H_{X Y, ϕ ϕ, 0}^{- 1} H_{X Y, ϕ ψ, 0}) \\ - (H_{X Y, λ_{X} λ_{Y}, 0} - H_{X Y, λ_{X} ϕ, 0} H_{X Y, ϕ ϕ, 0}^{- 1} H_{X Y, ϕ λ_{Y}, 0}) \\ \times {(H_{Y Y, λ_{Y} λ_{Y}, 0} + H_{X Y, λ_{Y} λ_{Y}, 0} - H_{X Y, λ_{Y} ϕ, 0} H_{X Y, ϕ ϕ, 0}^{- 1} H_{X Y, ϕ λ_{Y}, 0})}^{- 1} \\ \times (H_{X Y, λ_{Y} ψ, 0} - H_{X Y, λ_{Y} ϕ, 0} H_{X Y, ϕ ϕ, 0}^{- 1} H_{X Y, ϕ ψ, 0}) \end{matrix}

(53)

The matrices

B_{{\hat{λ}}_{Y, ULS}, λ_{Y}}

,

B_{{\hat{λ}}_{Y, ULS}, δ_{X}}

,

B_{{\hat{λ}}_{Y, ULS}, δ_{Y}}

, and

B_{{\hat{λ}}_{Y, ULS}, ψ}

can be similarly obtained. At appropriate places, the symbols “

X

” and “

Y

” in (50)–(53) must be swapped.

Now, we derive the estimating equations in the SAM approach. Only the

H_{X X}

function is used to compute the loadings

{\hat{λ}}_{X, SAM}

. By again relying on a Taylor expansion, we obtain

H_{X X, λ_{X} λ_{X}, 0} ({\hat{λ}}_{X, SAM} - λ_{X, 0}) = - H_{X X, λ_{X} δ_{X}, 0} δ_{X}

(54)

Using (32) and (33), we obtain, from (54),

{\hat{λ}}_{X, SAM} - λ_{X, 0} = ϕ δ_{X}

(55)

For the joint ULS estimation approach, we can write

{\hat{λ}}_{X, ULS} - λ_{X, 0} = - B_{{\hat{λ}}_{X, ULS}, λ_{X}}^{- 1} B_{{\hat{λ}}_{X, ULS}, δ_{X}} δ_{X} - B_{{\hat{λ}}_{X, ULS}, λ_{X}}^{- 1} B_{{\hat{λ}}_{X, ULS}, δ_{Y}} δ_{Y} - B_{{\hat{λ}}_{X, ULS}, λ_{X}}^{- 1} B_{{\hat{λ}}_{X, ULS}, ψ} ψ .

(56)

It could be argued that the SAM approach is more robust than the ULS estimation because SAM depends on neither

δ_{Y}

nor

ψ

. However, it will become evident in the rest of this article that this belief is unjustified in holding generally. In the following, Section 2.5, we demonstrate that the SAM approach coincides with the ULS estimation approach. Hence, SAM possesses the same robustness property to local model misspecifications such as ULS. As ULS is not robust to misspecifications, SAM also suffers from the lack of model robustness.

2.5. Equivalence of ULS and SAM for the Tau-Equivalent Measurement Model

We now compare ULS estimation and the two SAM approaches LSAM and GSAM in a tau-equivalent measurement model that has equal loadings. The data-generating model is given by

Λ = (\begin{matrix} λ_{X} & 0 \\ λ_{X} & 0 \\ λ_{X} & 0 \\ 0 & λ_{Y} \\ 0 & λ_{Y} \\ 0 & λ_{Y} \end{matrix}), Φ = (\begin{matrix} 1.0 \\ ϕ & 1 \end{matrix}) and Ψ = (\begin{matrix} ψ_{X_{1}} \\ 0 & ψ_{X_{2}} \\ 0 & 0 & ψ_{X_{3}} \\ 0 & 0 & 0 & ψ_{Y_{1}} \\ 0 & 0 & 0 & 0 & ψ_{Y_{2}} \\ 0 & 0 & 0 & 0 & 0 & ψ_{Y_{3}} \end{matrix}) .

(57)

The unique variances

ψ_{X_{i}}

and

ψ_{Y_{i}}

(

i = 1, 2, 3

) can be uniquely identified from the sample (or population) variances

s_{X_{i} X_{i}}

and

s_{Y_{i} Y_{i}}

. The ULS fitting function is given by

H (λ_{X}, λ_{Y}, ϕ) = \sum_{i \neq j} {(s_{X_{i} X_{j}} - λ_{X}^{2})}^{2} + \sum_{i \neq j} {(s_{Y_{i} Y_{j}} - λ_{Y}^{2})}^{2} + \sum_{i, j} {(s_{X_{i} Y_{j}} - ϕ λ_{X} λ_{Y})}^{2} .

(58)

The model parameters

λ_{X}

,

λ_{Y}

and

ϕ

can be determined by taking the corresponding partial derivatives of H defined in (58). The following estimating equations are obtained:

\sum_{i \neq j} {\hat{λ}}_{X, ULS} (s_{X_{i} X_{j}} - {\hat{λ}}_{X, ULS}^{2}) + {\hat{ϕ}}_{ULS} {\hat{λ}}_{Y, ULS} \sum_{i, j} (s_{X_{i} Y_{j}} - {\hat{ϕ}}_{ULS} {\hat{λ}}_{X, ULS} {\hat{λ}}_{Y, ULS}) = 0,

(59)

\sum_{i \neq j} {\hat{λ}}_{Y, ULS} (s_{Y_{i} Y_{j}} - {\hat{λ}}_{Y, ULS}^{2}) + {\hat{ϕ}}_{ULS} {\hat{λ}}_{X, ULS} \sum_{i, j} (s_{X_{i} Y_{j}} - {\hat{ϕ}}_{ULS} {\hat{λ}}_{X, ULS} {\hat{λ}}_{Y, ULS}) = 0 and

(60)

{\hat{λ}}_{X, ULS} {\hat{λ}}_{Y, ULS} \sum_{i, j} (s_{X_{i} Y_{j}} - {\hat{ϕ}}_{ULS} {\hat{λ}}_{X, ULS} {\hat{λ}}_{Y, ULS}) = 0 .

(61)

Using (61), we obtain from (59) and (60)

\sum_{i \neq j} {\hat{λ}}_{X, ULS} (s_{X_{i} X_{j}} - {\hat{λ}}_{X, ULS}^{2}) = 0 and

(62)

\sum_{i \neq j} {\hat{λ}}_{Y, ULS} (s_{Y_{i} Y_{j}} - {\hat{λ}}_{Y, ULS}^{2}) = 0 .

(63)

Interestingly, the estimating Equations (62) and (63) for

λ_{X}

and

λ_{Y}

coincide with the SAM approach because they only rely on the observed covariances

s_{X_{i} X_{j}}

and

s_{Y_{i} Y_{j}}

that refer to the separate measurement models of the two factors. Hence, we obtain

{\hat{λ}}_{X, SAM} = {\hat{λ}}_{X, ULS}

and

{\hat{λ}}_{Y, SAM} = {\hat{λ}}_{Y, ULS}

.

If GSAM is employed, the following estimating equation is solved with respect to

ϕ

{\hat{λ}}_{X, SAM} {\hat{λ}}_{Y, SAM} \sum_{i, j} (s_{X_{i} Y_{j}} - {\hat{ϕ}}_{GSAM} {\hat{λ}}_{X, SAM} {\hat{λ}}_{Y, SAM}) = 0 .

(64)

As the estimated loadings coincide for the ULS and the SAM approach, we infer that

{\hat{ϕ}}_{GSAM} = {\hat{ϕ}}_{ULS}

due to Equations (61) and (64). More concretely,

{\hat{ϕ}}_{ULS}

and

{\hat{ϕ}}_{GSAM}

are obtained by

{\hat{ϕ}}_{ULS} = {\hat{ϕ}}_{GSAM} = \frac{\frac{1}{9} \sum_{i, j} s_{X_{i} Y_{j}}}{{\hat{λ}}_{X, ULS} {\hat{λ}}_{Y, ULS}} .

(65)

If LSAM is employed, the mapping matrix

M_{LSAM} = {({\hat{Λ}}_{LSAM}^{⊤} {\hat{Λ}}_{LSAM})}^{- 1} {\hat{Λ}}_{LSAM}^{⊤}

is used, where

{\hat{Λ}}_{LSAM} = (\begin{matrix} {\hat{λ}}_{X, SAM} & 0 \\ {\hat{λ}}_{X, SAM} & 0 \\ {\hat{λ}}_{X, SAM} & 0 \\ 0 & {\hat{λ}}_{Y, SAM} \\ 0 & {\hat{λ}}_{Y, SAM} \\ 0 & {\hat{λ}}_{Y, SAM} \end{matrix}) .

(66)

We further compute

{({\hat{Λ}}_{LSAM}^{⊤} {\hat{Λ}}_{LSAM})}^{- 1} = (\begin{matrix} {\hat{λ}}_{X, SAM}^{- 2} / 3 & 0 \\ 0 & {\hat{λ}}_{Y, SAM}^{- 2} / 3 \end{matrix}) and

(67)

M_{LSAM} = (\begin{matrix} {\hat{λ}}_{X, SAM}^{- 1} / 3 & {\hat{λ}}_{X, SAM}^{- 1} / 3 & {\hat{λ}}_{X, SAM}^{- 1} / 3 & 0 & 0 & 0 \\ 0 & 0 & 0 & {\hat{λ}}_{Y, SAM}^{- 1} / 3 & {\hat{λ}}_{Y, SAM}^{- 1} / 3 & {\hat{λ}}_{Y, SAM}^{- 1} / 3 \end{matrix}) .

(68)

Finally, the estimated correlation matrix

{\hat{Φ}}_{LSAM}

is obtained as

{\hat{Φ}}_{LSAM} = M_{LSAM} (s - {\hat{Ψ}}_{SAM}) M_{LSAM}^{⊤},

(69)

where

{\hat{Ψ}}_{SAM}

contains the estimated measurement error variances. The estimated correlation

{\hat{ϕ}}_{LSAM}

is the entry in

{\hat{Φ}}_{LSAM}

in the second row and the first column. A direct computation of (69) shows that

{\hat{ϕ}}_{LSAM}

in LSAM equals

{\hat{ϕ}}_{ULS}

and

{\hat{ϕ}}_{GSAM}

in Equation (65). As a consequence, ULS, GSAM, and LSAM result in identical estimates for the tau-equivalent measurement model. Hence, two-step SAM cannot be more robust to local model misspecifications compared to one-step ULS estimation. We are aware of the fact that the tau-equivalent measurement model constitutes a very special CFA model. In general, the different estimates must not be identical. However, the fact that robustness is not fulfilled in a very simple model causes doubts about the general claim that SAM is more robust than ML or ULS estimation. In Section 4, we illustrate our analytical findings through six simulation studies.

3. Alternative Model-Robust Estimation Approaches

In this section, we discuss alternative estimation approaches that can handle local misspecified factor models. In Section 3.1, we introduce robust moment estimation (RME), which allows discrepancies between observed and model-implied covariances by treating large discrepancies as outliers. In Section 3.2, RME is applied in the GSAM approach by replacing the ML or ULS fitting function with a robust loss function. Finally, Section 3.3 discusses factor rotation methods with thresholding the error correlation matrix that simultaneously addresses the presence of cross loadings and residual error correlations.

3.1. Robust Moment Estimation (RME)

In the previous, Section 2, we showed that ULS and SAM estimation are not robust to model misspecifications. For K observed variables, the ULS fitting function minimizes the squared loss between observed covariances

s_{i j}

and model-implied covariances

σ_{i j} (γ)

of a model parameter

γ

[40]:

H (γ) = \sum_{i = 1}^{K} \sum_{j = 1}^{K} {(s_{i j} - σ_{i j} (γ))}^{2} .

(70)

By taking the derivative with respect to

γ

in (70), one obtains the estimating equations

H_{γ} (γ) = \frac{\partial H}{\partial γ} = - 2 \sum_{i = 1}^{K} \sum_{j = 1}^{K} \frac{\partial σ_{i j}}{\partial γ} (s_{i j} - σ_{i j} (γ)) = 0 .

(71)

Model errors

e_{i j} = s_{i j} - σ_{i j} (γ)

vanish using a particularly defined weighted average in which weights are determined by

\begin{matrix} (\partial σ_{i j} / \partial γ) \end{matrix}

. Typically, all covariances

s_{i j}

enter the computation of

γ

. However, if some model errors can be regarded as outliers, the least-squares loss function (70) can be replaced by a more model-robust loss function

ρ

[41] resulting in the fitting function

H (γ) = \sum_{i = 1}^{K} \sum_{j = 1}^{K} ρ (s_{i j} - σ_{i j} (γ)) .

(72)

The fitting function (72) is referred to as robust moment estimation (RME) because the second-order moments (i.e., covariances) are used as the input in a robust nonlinear regression with model parameter

γ

. Siemsen and Bollen [42] proposed

ρ (e) = | e |

(see also [43]). If model errors follow an asymmetric distribution (e.g., there are some positive errors, while all other errors are zero), the family of

L_{p}

loss functions

ρ (x) = {| x |}^{p}

with

p \in [0, 1)

is even more robust to model misspecifications [37]. The power

p = 0.5

has been used in invariance alignment [44] and is a good trade off between model robustness and estimation stability [45]. For CFA models, it can be suspected that a robust loss function is robust to the presence of a few unmodeled residual correlations. This property has also been demonstrated for robust estimation in the violation of measurement invariance [46].

In Equation (72), the input data are observed covariances

s_{i j}

, and model-implied covariances

σ_{i j} (γ)

are modeled. Hence, (72) can be viewed as fitting a nonlinear regression on covariances using a particular robust loss function

ρ

[40]. Again, the function H defines an M-estimation problem [29,31], and its use is justified by the hope that model errors

s_{i j} - σ_{i j} (γ)

are treated as outliers by utilizing the robust loss function

ρ

[19,47]. Hence, the model errors should not impact the estimate of the parameter vector

γ

.

In numerical optimization, one can substitute the nondifferentiable function

ρ

by a differentiable approximation

{\tilde{ρ}}_{ε} (x) = {(x^{2} + ε)}^{p / 2}

using a sufficiently small positive

ε

such as

ε = 10^{- 4}

(see [44,46,48,49] for examples). In practice, it is advisable to use reasonable starting values and to minimize (72) using a sequence of differentiable approximations with decreasing

ε

values (i.e., subsequently fitting

{\tilde{ρ}}_{ε}

with

ε = 10^{- 1}, 10^{- 2}, 10^{- 3}, 10^{- 4}

and using the previously obtained result as the initial value for the subsequent minimization problem).

3.2. GSAM with Robust Moment Estimation (GSAM-RME)

The RME fitting function is a one-step estimation approach like ULS or ML estimation. GSAM determines the factor loadings and the measurement error variances in the first step and fixes them in the second step for estimating the structural parameters. In Section 2, we used ULS estimation in the second step. However, a robust loss function

ρ

could also be applied in GSAM. For the two-dimensional CFA model, the estimated correlation in the robust variant of GSAM (referred to as GSAM-RME) is defined as

{\hat{ϕ}}_{GSAM - RME} = \underset{θ}{arg min} \sum_{i, j} ρ (s_{X_{i} Y_{j}} - ϕ {\hat{λ}}_{X_{i}, SAM} {\hat{λ}}_{Y_{j}, SAM}) .

(73)

The advantage of GSAM-RME is that it still estimates measurement models separately, while it allows for a few outlying model errors

s_{X_{i} Y_{j}} - ϕ {\hat{λ}}_{X_{i}, SAM} {\hat{λ}}_{Y_{j}, SAM}

. If some model errors should be intentionally removed from the estimation of

ϕ

, GSAM-RME might be the method of choice.

Notably, Equation (73) can be generalized to CFA models without assuming a simple structure for the loading matrix. For example, the loading matrix can be determined by exploratory factor analysis (EFA) in the first step (see next, Section 3.3). In the second step, the loadings are fixed when applying RME estimation. In this scenario, cross loadings are allowed while acknowledging the possibility of unmodelled residual error correlations by using a robust loss function. To our knowledge, the application of RME to SAM is a unique contribution of this paper.

3.3. Factor Rotation with Thresholding the Error Correlation Matrix

In the last two sections, we introduced RME and GSAM-RME estimation approaches that are aimed at treating unmodelled residual correlations as outliers. Hence, it can be suspected that these estimation alternatives are robust to this kind of local misspecification. However, there could also be cross loadings as another source of model misspecifications. In this case, assuming a simple structure for the loading matrix will lead to erroneous conclusions. There is a large literature on EFA [50] that discusses the estimation of loading matrices without a presumed simple structure.

For an observed covariance matrix

S

, EFA typically starts with a decomposition

S = L L^{⊤} + \begin{matrix} Θ \end{matrix},

(74)

where

L

is an initial loading matrix of prespecified rank D (i.e., the number of dimensions for factor variables) and

\begin{matrix} Θ \end{matrix}

is a diagonal matrix. In the next step, an oblique factor rotation is applied that estimates a

D \times D

rotation matrix

A

with

diag (A A^{⊤}) = (1, \dots, 1)

such that

S = Λ Φ Λ^{⊤} + \begin{matrix} Θ \end{matrix}

(75)

with

Λ = L A^{- 1}

and

Φ = A A^{⊤}

. The rotation matrix

A

is obtained by minimizing a simplicity function

Q = Q (Λ)

. The function Q penalizes the presence of many substantial cross loadings. In this article, we focus on two simplicity functions. First, Geomin rotation [51,52] uses the criterion

Q (Λ) = \sum_{i = 1}^{K} {[\prod_{d = 1}^{D} (λ_{i d}^{2} + ε)]}^{1 / D}

(76)

using a small

ε > 0

(e.g.,

ε = 0.01

). Recently,

L_{p}

(

p \in (0, 1]

) rotation for sparse loading matrices has been proposed by [53]

Q (Λ) = \sum_{i = 1}^{K} \sum_{d = 1}^{D} {| λ_{i d} |}^{p} .

(77)

The case

p = 1

was previously investigated by [54]. Again, the nondifferentiable optimization function can be replaced by a differentiable function in numerical optimization. In this article, we use

p = 0.5

.

There are two alternatives to proceeding with the rotation factor solution (75). First, the estimated correlation matrix

Φ

in (75) can be directly used as an estimate of the factor correlation. Second, the final correlation estimate is obtained in a two-step procedure. The loading matrix

Λ

is used to specify an extracted loading structure that retains all entries in the loading matrix whose absolute values exceed a certain cutoff. Typically, this operation only applies to detected cross loadings. In a second step, all loadings of the reduced structure and factor correlations are estimated in a CFA. To ensure robustness against unmodelled residual correlations, this CFA model is estimated using RME (see Section 3.1).

The critical aspect of the EFA approach described above in Equation (75) is that the error covariance matrix

\begin{matrix} Θ \end{matrix}

is assumed to be diagonal. Hence, biased results could be obtained in the presence of residual correlations. This possibility is tackled by a thresholding approach. Assume that, in an initial iteration

t = 0

, we obtain

S = L^{(0)} {L^{(0)}}^{⊤} + \begin{matrix} Θ^{(0)} \end{matrix}

(78)

in an unrotated EFA and a diagonal residual covariance matrix

\begin{matrix} Θ^{(0)} \end{matrix}

. In a thresholding step, large non-diagonal elements (in absolute value) are detected by computing

\begin{matrix} {\tilde{Θ}}^{(t - 1)} \end{matrix} = Thresh (S - L^{(t - 1)} {L^{(t - 1)}}^{⊤}) .

(79)

The function

Thresh

is a thresholding operator that replaces all small values in a matrix and the diagonal elements to zero. Hard and soft thresholding appear in practical implementations ([53,55]; see also [56,57]). In this article, we use a modified thresholding function that behaves like the soft thresholding for

| x | < c

and like the hard thresholding approach for

| x | > a c

for a cutoff value c and a factor

a \geq 1

. The values in the interval

[c, a c]

are obtained by linear interpolation. This thresholding function is also known as the smoothly clipped absolute deviation (SCAD) penalty ([58]; see also [59]). The thresholding step resembles a corresponding step in regularized EFA estimation [60]. Then, in the t-th step of the algorithm, the unrotated EFA is applied to an adjusted covariance matrix

S - \begin{matrix} {\tilde{Θ}}^{(t - 1)} \end{matrix}

, resulting in

S - \begin{matrix} {\tilde{Θ}}^{(t - 1)} \end{matrix} = L^{(t)} {L^{(t)}}^{⊤} + \begin{matrix} Θ^{(t)} \end{matrix} .

(80)

The estimation of the EFA solution and the thresholding of the residual covariance matrix cycles until convergence. The thresholding step depends on a cutoff value that defines the extent of ignored residual correlations in our modified CFA approach. At convergence, one applies the oblique rotation method to the loading matrix

L^{(t)}

and one obtains rotated loadings

Λ

and a factor correlation matrix

Φ

. Like in the EFA approach without thresholding factor covariances, the estimate

Φ

can be directly used, or a subsequent CFA step with extracted loading structure and a robust loss function can be employed. In large samples, the simultaneous application of oblique factor rotation and residual covariance matrix thresholding can potentially be robust to the presence of cross loadings and residual correlations.

4. Simulation Studies

In this section, we present findings from six simulation studies in CFA models with two, three, and five factors. In Section 4.1 (Simulation Study 1), we study the impact of residual error correlations as one kind of local model misspecification. In Section 4.2 (Simulation Study 2), we study the impact of cross loadings as another kind of model misspecification. In Section 4.3 (Simulation Study 3), we study the simultaneous presence of residual error correlations and cross loadings as local model misspecifications. In Section 4.4 (Simulation Study 4), we study a five-factor model that is very close to the original simulation in Rosseel and Loh [20]. Section 4.5 and Section 4.6 (Simulation Study 5) compare SAM and SEM in a three-factor model with cross loadings and residual correlations, respectively.

4.1. Simulation Study 1: Correlated Residual Errors in the Two-Factor Model

In the first three Simulation Studies, 1, 2, and 3, we restrict ourselves to a CFA model with two factors. Each factor is measured by three items. The factor correlation of the two factors is the target parameter of interest. In all simulation simulation studies, it is assumed that observed variables

X

are multivariate normally distributed. Samples of size N were drawn independently of samples units.

4.1.1. Method

The data-generating parameters in the two-dimensional CFA model were

Λ = (\begin{matrix} 0.55 & 0 \\ 0.55 & 0 \\ 0.55 & 0 \\ 0 & 0.45 \\ 0 & 0.45 \\ 0 & 0.45 \end{matrix}), Φ = (\begin{matrix} 1.0 \\ 0.6 & 1 \end{matrix}) and Ψ = (\begin{matrix} 0.6975 \\ 0 & 0.6975 \\ 0 & 0 & 0.6975 \\ ψ_{X_{1} Y_{1}} & 0 & 0 & 0.7975 \\ 0 & ψ_{X_{2} Y_{2}} & 0 & 0 & 0.7975 \\ 0 & 0 & 0 & 0 & 0 & 0.7975 \end{matrix}) .

(81)

McDonald’s Omega [61] for the two factors referring to the loadings 0.55 and 0.45 was determined as 0.57, and 0.43, respectively. Two residual correlations

ψ_{X_{1} Y_{1}}

and

ψ_{X_{2} Y_{2}}

are allowed in the data-generating model

Ψ

in (81). In the analysis model, we estimate a CFA model that allocates the items

X_{i}

(

i = 1, 2, 3

) to factor

F_{X}

and

Y_{i}

(

i = 1, 2, 3

) to factor

F_{Y}

and specifies a diagonal error covariance matrix. Hence, a misspecified model results. The data-generating model (81) is graphically displayed in Figure 3.

Three data-generating scenarios were simulated. First, no residual correlations were present in the data (i.e.,

ψ_{X_{1} Y_{1}} = ψ_{X_{2} Y_{2}} = 0

). Second, only

ψ_{X_{1} Y_{1}}

was different from zero and was set to 0.12 or

- 0.12

. Third, both residual correlations

ψ_{X_{1} Y_{1}}

and

ψ_{X_{2} Y_{2}}

were set to 0.12 or

- 0.12

.

We simulated sample sizes

N = 100, 250, 500, 1000, 2500

, and

10^{5}

. The situation

N = 10^{5}

indicates the case in which almost no sampling error in the observed covariance matrix is obtained. This condition allows the investigation of the asymptotic bias of the competing estimation approaches.

We compared the estimated factor correlations using 15 different estimation approaches. First, we utilized one-step estimation approaches ULS, ML, and RME using the power

p = 0.5

. Second, the four different SAM implementations LSAM, GSAM-ML, GSAM-ULS, and GSAM-RME were employed. The two GSAM approaches, GSAM-ML and GSAM-ULS, use ML and ULS in the second estimation step while fixing the measurement model parameters to those obtained from two single CFA models in the first step, GSAM is applied with RME using the power

p = 0.5

in GSAM-RME. Third, we used Geomin and Lp rotation with

p = 0.5

[53] as EFA factor rotation methods. In the methods Geomin and Lp, we directly obtained the estimated factor correlation from the EFA model with thresholding the residual covariance matrix. A thresholding approach for the residual covariance matrix with a cutoff

c = 0.03

and

a = 1.5

(see Section 3.3) was applied in the factor rotation methods Geomin(THR) and Lp(THR). Finally, these four rotation methods were followed by a second step in which RME for an extracted CFA model was used. Cross loadings for at most one of the three items in a scale were allowed if they exceeded 0.15 (in absolute value). Note that more cross loadings cannot be identified in a two-dimensional CFA with only three items per factor. The four resulting approaches are referred to as Geomin(RME), Lp(RME), Geomin(THR,RME), and Lp(THR,RME).

CFA estimation was conducted by applying constrained estimation for target loadings (that is, primary factor loadings) and measurement error variances. The inequality constraint

p a r > 0.01

was used where

p a r

was a loading or variance parameter. Previous research has shown that constrained (ML) estimation is preferable to unconstrained (ML) estimation in small samples because it solves convergence issues and substantially reduces variability in estimates [62,63].

We also studied the impact of local misspecifications at the population level by applying Equation (8) in Section 2.1 for the methods ML, ULS, LSAM, GSAM-ML and RME. A value of 0.05 was applied for one of the misspecification parameters, and the impact on the estimated factor correlation was assessed. The estimate of the derivative in (8) is referred to as parameter sensitivity due to misspecification.

In each simulation condition, 1500 replications were conducted. We computed the bias, the standard deviation (SD), and the root mean square error (RMSE) for the estimated correlation. All analyses were carried out in the statistical software R [64]. ULS, ML and LSAM were computed using the most recent lavaan version [65]. Geomin rotation was conducted using the GPArotation package [66]. All other estimation functions were written by the author in R. RME also appears in the experimental mgsem() function in the sirt package [67]. The lavaan syntax for estimating ULS, ML and LSAM can be found in Appendix B. R code for Simulation Study 1 can be found at https://github.com/alexanderrobitzsch/supplement_sam/tree/main/Study1 (accessed on 22 June 2022).

4.1.2. Results

First, we studied the parameter sensitivity of model misspecification parameters

ψ_{X_{1} Y_{1}}

and

ψ_{X_{2} Y_{2}}

. They were 0.48 (ML), 0.47 (ULS), 0.45 (LSAM), 0.45 (GSAM-ML) and 0.01 (RME). The same values were obtained for the two misspecification parameters. The parameter sensitivity allows the prediction of asymptotic biases for the misspecification

ψ_{X_{1} Y_{1}} = 0.12

. For the ML approach, we obtain

0.12 \times 0.48 = 0.06

as a bias estimate if a positive residual error correlation was assumed. For the case of two positive residual error correlations, we would expect a bias of

2 \times 0.06 = 0.12

. As suspected, RME is robust to unmodelled error correlations.

In Table 1, we present the bias and the RMSE of the estimated factor correlation as a function of no, one positive or two positive residual error correlations.

As expected, ML, ULS and RME result in unbiased estimates if there is no model misspecifications (i.e., no residual error correlations in the data-generating model). RME has some efficiency loss for a correctly specified model. Notably, all SAM approaches are negatively biased for smaller samples (

N =

100,250). This causes doubts about the recommendation that SAM should be particularly appropriate in small samples. In addition, note that the EFA approaches that directly use the estimated factor correlation result in biased estimates even in large samples. The subsequent RME step tends to be more robust in the EFA approaches, but it still contains bias.

As predicted, ML, ULS, and the non-robust SAM approach resulted in biased estimates with one or two unmodelled residual correlations. It seems that the SAM approaches provided less biased estimates in smaller samples. However, this result is only due to the general negative bias of SAM without misspecification in small samples. In addition, note that the approaches RME and GSAM-RME are robust to the presence of residual error correlations. If EFA methods are applied, it is important that one uses it as a two-step method with a simplified loading structure in combination with RME. However, the application of EFA in the first step always includes additional variability in the estimated correlation. The Geomin rotation had slightly superior performance than Lp rotation in terms of RMSE.

In Table 2, the SD of the estimated factor correlation is presented as a function of sample size in the data-generating model with no residual correlation. In the small sample size

N = 100

, LSAM (

SD = 0.194

) had a reduction in the SD of 6.6% compared to ML (

SD = 0.211

). However, ULS estimation (

SD = 0.198

) was very close to LSAM. In all larger sample sizes (

N \geq 250

), there was no advantage for favoring the two-step LSAM approach over the one-step methods ML or ULS. As expected, using RME estimation comes at the price of increased variability in estimates.

In Table A1 in Appendix A, the bias and the RMSE of the estimated factor correlation as a function of one negative or two negative residual correlations are presented. In contrast to the situation of positive residual correlations, LSAM is more biased than ML or ULS estimation. Ignoring a negative residual correlation resulted in a negatively biased factor correlation. However, LSAM was generally biased in small samples. Hence, the negative bias must, consequently, be larger in magnitude than for the SEM estimation methods ML and ULS. For positive residual correlations, the negative small-sample bias of LSAM compensated for the general positive bias when ignoring error correlations. This gave the impression that LSAM had general advantages compared to ML or ULS but was only a consequence as a positive and negative bias can cancel each other for LSAM in particular data constellations.

Overall, using robust loss functions provided model-robust estimation in the presence of residual error correlations. As predicted by our analytical derivations, SAM is generally not more robust than ML or ULS estimation. To investigate whether our observations were generalizable across different conditions and to further study the small-sample bias of the LSAM method, we conducted three additional focused simulation studies in the data-generating model with residual correlations. These focused simulation studies are presented in the following Section 4.1.3–Section 4.1.5.

4.1.3. Focused Simulation Study 1A: Choice of Model Identification

In the first focused simulation study 1A, we compared two different model identification strategies for estimating factor models and investigated whether the performance of LSAM would improve when using a different strategy. In Simulation Study 1, we used the unit latent variance identification (UVI) strategy that estimates all factor loadings and fixes the variance in the latent variable to one. A popular alternative identification strategy (unit loading identification, ULI) fixes one factor loading to one and estimates the variance of the latent variable. In this simulation, we compared UVI and ULI using the data-generating model from Simulation Study 1 (see Section 4.1.1). UVI and ULI were implemented as constrained estimation approaches. We only focused on the comparison of ML, ULS and LSAM and compared all methods with bias and RMSE. The lavaan syntax for the ULI strategy can be found in Appendix C. The lavaan syntax for estimating the UVI strategy can be found in Appendix B.

Table 3 compares the estimated factor correlation in the two different identification methods. LSAM slightly improved for the sample size

N = 100

when using ULI instead of UVI. In larger sample sizes, differences practically disappeared. Hence, we can conclude that the small-sample bias of LSAM persists when using a different identification strategy. Consequently, LSAM is also not more robust when utilizing ULI instead of UVI.

4.1.4. Focused Simulation Study 1B: Investigating the Small-Sample Bias of LSAM

In the second focused simulation study, 1B, we study the small-sample bias of the LSAM estimation method in more detail. We rely on the two-factor model (81) with two residual correlations. In contrast to the main Simulation Study 1, we used the same factor loading

λ

for all six manifest variables in the data-generating model. We varied the factor loadings as 0.4, 0.5, 0.6, 0.7, and 0.8. McDonald’s Omega [61] for these loadings was determined as 0.36, 0.50, 0.63, 0.74, and 0.84, respectively. We also varied the factor correlation

ϕ

as 0.0, 0.2, 0.4, 0.6, and 0.8. In this focused simulation study, we assumed a correctly specified model for studying the small-sample bias of LSAM. We report the bias of the estimated factor correlation for the sample size

N = 100

.

In Table 4, the estimated factor correlation estimated by LSAM is displayed as a function of the common loading

λ

and the true factor correlation

ϕ

. It can be seen that LSAM is generally negatively biased. The bias is larger in magnitude for lower loadings (i.e., less reliably measured factors) and higher true factor correlations. From Simulation Study 1, it can be inferred that the bias of the LSAM estimate decreases in larger sample sizes.

4.1.5. Focused Simulation Study 1C: Bootstrap Bias Correction of the LSAM Method

In the third focused simulation study, 1C, we investigate whether the small-sample bias of the LSAM method can be reduced with bootstra-bias-correction (BBC) methods [68,69]. We use BBC with nonparametric bootstrap. For a sample size N, nonparametric bootstrap is defined as a sampling of units with replacement from the original sample. We use 100 bootstrap samples to determine the bias corrected estimate. If

\hat{ϕ}

is the original estimate and

{\bar{\hat{ϕ}}}_{boot}

the average of the bootstrap estimates of the same sample size, we define the bootstrap bias corrected estimate

{\hat{ϕ}}_{BBC}

as

\begin{matrix} {\hat{ϕ}}_{BBC} = \hat{ϕ} - ({\bar{\hat{ϕ}}}_{boot} - \hat{ϕ}) . \end{matrix}

(82)

In this focused simulation study, we compare ML and LSAM estimation with and without the BBC method. We used no, one positive, and two positive residual correlations in the data-generating model.

In Table 5, the results for the four different estimation methods ML, ML (BBC), LSAM, and LSAM (BBC) are presented. For ML estimation, the results with and without BBC were quite similar in terms of bias for

N \geq 250

. However, there is a noticeably increased variability in BBC estimates (i.e., ML (BBC)) compared to ML estimates. For the LSAM estimates, it turned out that the bias was reduced with BBC. However, the bias was still

- 0.08

for

N = 100

. The LSAM bias vanished when applying BBC for sample sizes

N \geq 250

. Interestingly, there was a substantial increase in the variability of LSAM estimates due to BBC (i.e., the SD increased from 0.19 to 0.27). However, the BBC bias reduction for LSAM does not outweigh the increased variability in terms of RMSE. Nevertheless, if LSAM were the researcher’s method of choice, we would recommend applying the small-sample bias correction method.

4.2. Simulation Study 2: Cross Loadings in the Two-Factor Model

4.2.1. Method

In this simulation study, cross loadings were simulated as local model misspecifications. The data-generating parameters were defined as

Λ = (\begin{matrix} 0.55 & δ_{X_{1}} \\ 0.55 & 0 \\ 0.55 & 0 \\ δ_{Y_{1}} & 0.45 \\ 0 & 0.45 \\ 0 & 0.45 \end{matrix}), Φ = (\begin{matrix} 1.0 \\ 0.6 & 1 \end{matrix}) and Ψ = (\begin{matrix} 0.6975 \\ 0 & 0.6975 \\ 0 & 0 & 0.6975 \\ 0 & 0 & 0 & 0.7975 \\ 0 & 0 & 0 & 0 & 0.7975 \\ 0 & 0 & 0 & 0 & 0 & 0.7975 \end{matrix}) .

(83)

The data-generating model (83) for Simulation Study 2 is graphically displayed in Figure 4.

The cross loadings

δ_{X 1}

and

δ_{Y 1}

were both set to 0.3 or

- 0.3

in the condition with two positive or negative cross loadings, respectively. In the condition with one cross loading,

δ_{Y 1}

was set to zero.

The analysis procedure exactly follows those described for Simulation Study 1 (see “Method” Section Section 4.1.1). Again, 1500 replications were used for studying bias and RMSE of the estimated factor correlation.

4.2.2. Results

First, we studied the sensitivity of the two misspecification parameters

δ_{X_{1}}

and

δ_{Y_{1}}

. It turned out that all estimation approaches that were not based on EFA resulted in expected asymptotic biases (

δ_{X_{1}}

: ML: 0.40, ULS: 0.39, LSAM: 0.39, GSAM-ML: 0.39, RME: 0.52;

δ_{Y_{1}}

: ML: 0.49, ULS: 0.48, LSAM: 0.48, GSAM-ML: 0.48, RME: 0.63). Interestingly, RME even produced more sensitivity to misspecifications than ULS, ML, and all SAM approaches.

In Table 6, the bias and RMSE for the estimated factor correlation are presented for one and two cross loadings. As predicted, all non-EFA-based approaches resulted in biased estimates. Importantly, the SAM approaches had no superior performance regarding bias or RMSE than ML, ULS or RME. Overall, the two-step EFA approach with Geomin rotation was the frontrunner among the different methods.

Table A2 in Appendix A displays the bias and the RMSE of the estimated factor correlation for one negative or two negative cross loadings as a function of sample size for Simulation Study 2. It can be seen that LSAM was more biased than the SEM methods ML and ULS if the cross loadings were negative instead of positive. Positive cross loadings in the data-generating model make LSAM look better than SEM methods in small samples. However, like for residual error correlations in Simulation Study 1, this property is just a consequence of whether two biases cancel each other or amplify depended on the data-generating parameters.

4.3. Simulation Study 3: Correlated Residual Errors and Cross Loadings in the Two-Factor Model

4.3.1. Method

In Simulation Study 3, one positive cross loading

δ_{X_{1}}

and one positive residual correlation

ψ_{X_{2} Y_{2}}

were defined in the data-generating model:

Λ = (\begin{matrix} 0.55 & δ_{X_{1}} \\ 0.55 & 0 \\ 0.55 & 0 \\ 0 & 0.45 \\ 0 & 0.45 \\ 0 & 0.45 \end{matrix}), Φ = (\begin{matrix} 1.0 \\ 0.6 & 1 \end{matrix}) and Ψ = (\begin{matrix} 0.6975 \\ 0 & 0.6975 \\ 0 & 0 & 0.6975 \\ 0 & 0 & 0 & 0.7975 \\ 0 & ψ_{X_{2} Y_{2}} & 0 & 0 & 0.7975 \\ 0 & 0 & 0 & 0 & 0 & 0.7975 \end{matrix}) .

(84)

The data-generating model (84) of Simulation Study 3 is graphically displayed in Figure 5.

The analysis procedure follows those of the two previous simulation studies (see “Method”, Section 4.1.1).

4.3.2. Results

The parameter sensitivity for the two misspecification parameters was already reported in Section 4.1.2 (for

ψ_{X_{2} Y_{2}}

) and Section 4.2.2 (for

δ_{X_{1}}

). ML, ULS, and RME are not robust to both misspecifications. RME is robust with respect to the residual correlation

ψ_{X_{2} Y_{2}}

, but not robust with respect to the cross loading

δ_{X_{1}}

.

In Table 7, the bias and RMSE for the estimated factor correlation are presented. The one-step approaches and SAM resulted in positively biased estimates. The two-step EFA-based RME with Geomin rotation had satisfactory performance. Overall, the findings demonstrate that SAM should not be generally preferred over ML or ULS because it does not result in less biased or more stable estimates.

4.4. Simulation Study 4: A Five-Factor Model Very Close to the Rosseel-Loh Simulation Study

In this Simulation Study, we investigated why we came to different conclusions than Rosseel and Loh (RL; [20]) in the previous three simulation studies. We used their simulation study that included five factors as a starting point and specified a data-generating model that closely mimics the original specification in RL. However, we did not conduct a full replication study (although RL provided a detailed replication code at OSF https://osf.io/pekbm/; accessed on 22 June 2022) due to two reasons. First, RL only investigated bias and RMSE for unstandardized regression coefficients for determining relationships among factor variables. Hence, conditional associations are investigated instead of marginal factor correlations. Second, RL used the average relative bias across the different regression coefficient estimates as the criterion for determining bias in their simulation study. In such an approach, positive and negative biases can cancel each other out. In our simulation, we prefer the average absolute bias over the average bias because it better quantifies a summary of bias in model parameters.

4.4.1. Method

As in RL, we specified three data-generating models involving five factors. Each of the five factors has three items with primary loadings on these factors.

In Figure 6, the data-generating model of the correctly specified model (DGM1) is displayed. In this model, no residual correlations or cross loadings exists. The loadings of the items were chosen to fulfill a reliability of 0.70 according to McDonald’s Omega. The first factor loading of an item for a factor was larger than the second and third one. Hence, in contrast to Simulation Studies 1, 2, and 3, the data was not simulated based on equal factor loadings. The model parameters and resulting population-level covariance matrices of the data-generating mode can be found at https://github.com/alexanderrobitzsch/supplement_sam/tree/main/Study4 (accessed on 22 June 2022).

RL [20] defined a non-saturated covariance structure for the factors. The covariance structure for factor variables

F = (F_{1}, F_{2}, F_{3}, F_{4}, F_{5})

is defined as a sequence of regression models that depend on a common regression coefficient

β

. This path model is given as

(\begin{matrix} F_{1} \\ F_{2} \\ F_{3} \\ F_{4} \\ F_{5} \end{matrix}) = (\begin{matrix} 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ β & β & 0 & β & 0 \\ β & β & 0 & 0 & 0 \\ β & 0 & β & β & 0 \end{matrix}) (\begin{matrix} F_{1} \\ F_{2} \\ F_{3} \\ F_{4} \\ F_{5} \end{matrix}) + (\begin{matrix} ξ_{1} \\ ξ_{2} \\ ξ_{3} \\ ξ_{4} \\ ξ_{5} \end{matrix})

(85)

The vector of residual factor variables

ξ_{d}

(

d = 1, \dots, 5

) is assumed to be independently and normally distributed. The variances are chosen such that

Var (F_{d}) = 1

for all factor variables

F_{d}

(

d = 1, \dots, 5

). Note that the model-implied covariance matrix

Var (F) = Φ = Φ (β)

is a function of the regression parameter

β

. The structural model for the factor variables

F

is graphically displayed in Figure 7.

The second data-generating model (DGM2) included three cross loadings. Each of cross loadings is defined as

0.9 \cdot λ_{0}

, where

λ_{0}

is the value of the primary loading. In Figure 8, the data-generating model with cross loadings (DGM2) is displayed. The same structural model

Φ (β)

for the covariance matrix of factor variables is assumed DGM2.

The third data-generating model (DGM3) included 20 residual correlations. Figure 9 displays DGM3.

We used the same sample sizes

N = 100, 250, 1000, 2500, 5000, 10^{5}

as in the first three simulation studies. As in RL, we chose

β = 0.1

which resulted in factor correlations

ϕ_{21} = 0.00

,

ϕ_{31} = 0.11

,

ϕ_{32} = 0.11

,

ϕ_{41} = 0.10

,

ϕ_{42} = 0.10

,

ϕ_{43} = 0.12

,

ϕ_{51} = 0.12

,

ϕ_{52} = 0.02

,

ϕ_{53} = 0.12

,

ϕ_{54} = 0.12

. Hence, the factor correlations were quite low and, according to Simulation Study 1B (see Section 4.1.4), the small-sample bias of LSAM was expected to be negligible.

We applied the four estimation methods ML, ULS, LSAM and RME, and we computed the average absolute bias and the average RMSE of the factor correlations. The R code used for Simulation Study 4 can be found at https://github.com/alexanderrobitzsch/supplement_sam/tree/main/Study4 (accessed on 22 June 2022).

4.4.2. Results

In Table 8, the average absolute bias and the average RMSE are presented as a function of sample size. In the correctly specified model (DGM1), all four estimation approaches were unbiased. The average RMSE was similar, but LSAM had slight efficiency gains compared to ML and ULS, while RME had efficiency losses compared to ML and ULS.

In the data-generating model with cross loadings (DGM2), LSAM performed best. Interestingly, we observed noticeable differences between ML and ULS estimation. Note that the variances in the manifest variables differed from each other, which could explain the different performance between methods. RME outperformed LSAM at the population level (or for very large sample sizes), but was similar to LSAM in finite sample sizes.

LSAM also outperformed ML estimation for the data-generating model with residual correlations (DGM3). However, in this scenario, ULS performed better than LSAM. As expected from the simulation studies involving two factors, RME can efficiently handle the presence of residual correlations and results in practically unbiased estimates for sufficiently large sample sizes (i.e.,

N \geq 500

). Notably, RME was even the frontrunner in terms of RMSE for all sample sizes.

In future research, the different performances of ML, ULS and LSAM should be investigated in more detail. There is evidence from the literature that ULS could have advantages over ML for misspecified factor models [26], although these findings certainly depend on the data-generating scenarios.

4.4.3. Focused Simulation Study 4A: Varying the Size of Factor Correlations

Simulation Study 4 used relatively low factor correlations in the data-generating model. In this focused simulation study, 4A, we study the performance of the ML, ULS, LSAM and RME approaches for the three data-generating models of Simulation Study 4 at the population level (i.e., using the true covariance matrix as the input data). Moreover, we varied the size of factor correlations

Φ (β)

by specifying the factor levels

β = 0.1, 0.2, 0.3, 0.4

. For example, the resulting factor correlations for

β = 0.4

were

ϕ_{21} = 0.00

,

ϕ_{31} = 0.44

,

ϕ_{32} = 0.44

,

ϕ_{41} = 0.37

,

ϕ_{42} = 0.37

,

ϕ_{43} = 0.58

,

ϕ_{51} = 0.55

,

ϕ_{52} = 0.27

,

ϕ_{53} = 0.66

,

ϕ_{54} = 0.62

.

In Table 9, the average absolute bias of factor correlations is presented for the four estimation approaches and the three data-generating models as the function of the common regression coefficient

β

. In the case of the correctly specified model (DGM1), all methods performed satisfactorily. The biases of ML, ULS and LSAM increase with increasing

β

values in the data-generating model with cross loadings (DGM2). Notably, RME was strongly biased for

β \geq 0.2

and cannot adequately handle cross loadings if the factor correlations are not excessively small. Across all four conditions in DGM2, LSAM would be preferred over ML. In the data-generating model with residual correlations (DGM3), RME was unbiased. ULS had some advantages compared to LSAM and ML.

Overall, we can state that the conclusion in RL that LSAM is more robust to SEM (ML or ULS) does not hold in data constellations close to their conditions. LSAM has advantages over ML, but this does not generally hold if it is compared with ULS. Moreover, although LSAM could potentially improve ML estimation (DGM2 for

β = 0.4

: 38.5% bias reduction from 0.13 to 0.08; DGM3 for

β = 0.4

: 18.8% bias reduction from 0.16 to 0.13), it is not robust (i.e., unbiased) to local model misspecifications.

4.5. Simulation Study 5: Comparing SAM and SEM in a Three-Factor Model with Cross Loadings

The first three Simulation Studies 1, 2 and 3 involving two factors came to slightly different findings than Simulation Study 4, which involved five factors. In Simulation Study 5, we investigate whether the property of a larger degree of model robustness depends on the choice of data-generating parameters. To do so, we chose a three-factor model with cross loadings as the data-generating model (see Figure 10). All factor loadings referring to the same factor were equal. The factor loadings were chosen as

λ_{1} = 0.6

,

λ_{2} = 0.65

, and

λ_{3} = 0.55

for the three factors, respectively. The cross loading

δ_{1}

was fixed to 0.4.

In order to study a wide range of scenarios, we varied the size of cross loadings

δ_{2}

and

δ_{3}

and the size of factor correlations

ϕ_{21}

,

ϕ_{31}

and

ϕ_{32}

. As the variances of the manifest variables were quite similar and close to the variances chosen in the two-factor model simulations, we only compared ULS with LSAM because we expected that ML would perform very similarly to ULS (see results of Simulation Studies 1, 2, and 3). In the data-generating model, The two cross loadings

δ_{2}

or

δ_{3}

can be either 0 or 0.4. The three factor correlations can vary with levels 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, and 0.6. In total, there are 2 (

δ_{1}

) × 2 (

δ_{2}

) × 7 (

ϕ_{21}

) × 7 (

ϕ_{31}

) × 7 (

ϕ_{32}

) = 1372 combinations for three correlations.

We conducted the analysis at the population level (i.e., we used the true covariance matrix as the input). We then assessed whether SAM was superior to SEM (i.e., ULS) in terms of bias. We define that a factor correlation estimate has “No bias” if both estimates (ULS and LSAM) had absolute biases smaller than 0.05. We declared SAM superior to SEM (“SAM better”) if ULS was biased and SAM had a minimum percentage error of 20% compared to ULS. We declared SEM superior to SAM (“SEM better”) if LSAM was biased and ULs had a minimum percentage error of 20% compared to LSAM. In all other cases, we say that both estimates are biased (“Both biased”). We assessed the percentage values of data constellations in which each of the four events occurred. R code for Simulation Study 5 can be found at https://github.com/alexanderrobitzsch/supplement_sam/tree/main/Study5 (accessed on 22 June 2022).

In Table 10, the percentage values for the four events are displayed. Across all conditions (“all factor correlations”), the chance of unbiased estimation of both estimates was higher if both cross loadings

δ_{2}

and

δ_{3}

were zero. SAM was only superior to SEM if the second and third factor loading were zero. As an additional analysis, we only investigated data constellations in which all factor correlations were at most 0.20 (i.e., small correlations) or at least 0.30 (i.e., large correlations). For small correlations, a noticeable percentage of data constellations was detected in which SAM was preferred over SEM. In fact, in the case of small correlations, SEM was never better than SAM. The situation changed for large factor correlations. In this situation, SEM (i.e., ULS) was preferred over SAM in some of the data constellations. Hence, we think that these findings demonstrate that one cannot state that SAM is generally more robust than SEM.

4.6. Simulation Study 6: Comparing SAM and SEM in a Three-Factor Model with Residual Correlations

In the last Simulation Study, 6, we again rely on a three-factor model with factor loading specifications as in Simulation Study 5 (see Section 4.5), but specify three common residual correlations instead of cross loadings. The data-generating model is graphically displayed in Figure 11.

We chose the common residual correlation

ψ

as 0.2 or 0.4. As in Simulation Study 5, seven levels of factor correlations (i.e., 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6) were chosen for

ϕ_{21}

,

ϕ_{31}

and

ϕ_{32}

. In total, there are 2 (

δ

) × 7 (

ϕ_{21}

) × 7 (

ϕ_{31}

) × 7 (

ϕ_{32}

) = 686 combinations for three correlations. We used the same four events as in Simulation Study 5 to assess whether SAM is preferred over SEM or the other way around. Like in Simulation Study 5, we only compared ULS (SEM) with LSAM (SAM).

In Table 11, the percentages of the four events are presented. Across all sizes of factor correlations, we did not observe noticeable differences between SAM and SEM. This situation remains unchanged when focusing only on data constellations with small or large factor correlations, respectively. Hence, the findings from this simulation involving three factors was very similar to Simulation Study 1 involving two factors because we did not observe differences between SEM and SAM in both studies at the population level.

5. Discussion

In this article, we compared the recently proposed two-step SAM approach with alternative estimation approaches. It turned out that SAM is not more robust than the frequently employed ML and ULS estimation methods to local model misspecifications in cross loadings or residual error correlations. The only situation in which SAM is expected to be more robust compared to ML and ULS is the estimation of a non-saturated structural model. In this case, misspecifications in the structural model enter the estimation of measurement models in one-step estimation approaches.

As an alternative to SAM, ML and ULS, we studied RME with the robust

L_{p}

loss functions

ρ (x) = {| x |}^{p}

for

p \in (0, 1]

. It was demonstrated that RME is robust to unmodelled residual correlations but did not possess such a robustness property in the presence of unmodelled cross loadings. Moreover, RME can also be applied in two-step estimation approaches in GSAM. In addition, RME can be applied in multiple-group estimation in violation of measurement invariance, and it mimics the estimation of model parameters under partial invariance [37].

We also studied EFA-based estimation approaches that aimed at exploring the factor loading structure in the presence of cross loadings. These methods work well in large samples but introduce bias for smaller sample sizes at the same time. It would be an interesting topic of future research to investigate whether the bias could be reduced by resampling-based bias correction methods such as bootstrap or jackknife [69].

Rosseel and Loh [20] emphasize that separating the estimation of measurement and structural parts in the SAM approach offers three advantages over one-step estimation like ML or ULS. They note that “First, estimates are more robust against local model misspecifications. Second, estimation routines are less vulnerable to convergence issues in small samples. Third, estimates exhibit smaller finite sample biases under correctly specified models”. We are afraid to say that we disagree with all of the arguments. First, we demonstrated with analytical and numerical illustration that SAM is not generally more robust than ML and ULS if the local model misspecifications pertain cross loadings and residual correlations. However, it was shown that SAM slightly outperformed SEM in factor models with small factor correlations. The statement of a larger degree of robustness only applies to misspecifications in the structural model, which was not clearly mentioned in [20]. Second, it was claimed that SAM has fewer convergence issues than ML or ULS. This is undoubtedly true for unconstrained ML estimation. However, convergence issues disappear with constrained ML estimation that only allows positive loadings and positive variances. As shown in our simulation studies, constrained ML estimation also has lower variability in terms of RMSE. Importantly, the variability in parameter estimates also depends on the chosen parametrization of the CFA model. Third, the reported finite sample bias in ML is an artifact of only using unconstrained ML. With constrained ML, we did not observe notable biases in small samples. In contrast, SAM showed substantial biases in small sample sizes and should be particularly avoided in this situation. Unfortunately, this property was not mentioned in [20]. However, we have shown that bootstrap bias correction partially removes the small-sample bias of LSAM estimation.

Our discussion could be read as a harsh critique against using SAM for SEM in applied research. However, we believe the opposite is true, which will be explained below. Our motivation for writing this article was to point out that SAM should not be advertised for the wrong reasons. As SAM is implemented in the wide-spread R package lavaan, we expect that it will reach a wide audience of applied researchers, and it is essential to understand the limitations of SAM.

5.1. Regularized Estimation and Misspecified Models

Regularized estimation of SEMs might be a more direct approach for modeling misspecifications in cross loadings and residual correlations [60,70,71,72,73,74,75]. As argued in [53], in regularized estimation, the search for an optimal regularization parameter might even be avoided because choosing a prespecified sufficiently small parameter still ensures identified models, reduces bias and only introduces additional variability in estimates. However, regularized estimation almost always relies on sparsity assumptions in cross loadings and residual correlations that might not be plausible in practical applications.

As an alternative to the sparsity assumption for model misspecifications, ridge penalties on residual correlations [76,77] or cross loadings [78,79] might be specified. Note that regularized estimation with ridge penalties corresponds to Bayesian estimation with normal prior distributions.

5.2. Why the SAM Approach Should Generally Be Preferred over SEM

In this section, we argue why researchers should follow the SAM approach or a two-step approach to SEM [80,81,82,83,84], in general. We are convinced of the idea that the definition of latent variables (i.e., factors) should be defined using a prespecified set of indicators (i.e., items). Items referring to other factors should not influence the meaning of the factor of interest. In this sense, it is clear that we prefer the specification of separate measurement models, and items should be allocated to factors solely based on substantive theory. Ideally, each item should be apriori assigned to only one factor. SAM shares the advantage with other two-step estimation methods that it separates the definition of measurement models from the estimation of structural models [85,86,87,88]. In contrast, one-step SEM estimation methods confound the definition and estimation of the measurement model and the structural model.

In our perspective, the meaning of factors remains entirely unclear if they are extracted from EFA. If measurement models are separately specified, it is likely that local model misspecifications arise in the SAM approach. Note that model misspecifications (i.e., model misfit) become visible in the GSAM approach but are hidden in the LSAM approach. Typically, cross loadings or residual correlations could be empirically detected. However, if they were included in the model, the meaning of the latent factors would change. It is unclear why an estimated correlation between two factors in a modified model, including cross loadings with an acceptable model fit, should be less biased (i.e., the target estimand of interest) than in a model without cross loading with a poor model fit (see [89] for a prototypical example). We want to point out that we believe that the concept of bias in empirical applications should not be related to a correct model specification. In contrast, a misspecified SEM is purposely fitted in practical applications, and the allocation of items to factors can allow unbiased estimation of relationships between factors. Hence, in contrast to what is frequently found in the psychometric literature, we believe that a modified SEM with an improved model fit but included cross loadings and residual errors will result in biased estimates because it redefines latent variables. More critically, why should a statistical (or psychometric) model redefine the meaning of a variable in a model [90]? Should we believe that psychometrics could replace the difficult question of defining how to measure a construct [91]?

We argued above that it is adequate to use a factor correlation in a misspecified SEM without cross loadings or residual correlations instead of using factor correlations from a well-fitting model with empirically derived cross loadings and residual correlations. This means that we declared the parameter of interest from the misspecified model. Using the factor correlations from the well-fitting model would result in a biased estimate because we would choose the wrong parameter of interest. We expect that researchers will typically disagree on which parameter should define truth, although we observe that model fit plays an important role in defining parameters of interest in applied and psychometric research. A similar situation arises if one applies factor models to questionnaire items on a Likert scale (say, with item scores 1, 2, 3, and 4). In this situation, we can apply an ordinal factor model that relies on polychoric correlations or uses Pearson correlations that does not change the metric of the items. Whether one of the two methods is biased depends on what researchers define as the true model. In psychometrics, factor models based on polychoric correlations seem to be preferred over those relying on Pearson correlations [92,93,94]. We argued that it is equally defensible to define the factor model parameters based on Pearson correlations as those of interest [95,96]. In this situation, different researchers will disagree with what is termed as bias in factor models because they rely on different definitions of the truth. Note that these bias considerations are mainly relevant for population-level data. This kind of bias might be labeled as estimation error or transformation error, while the bias of an estimator in finite samples is labeled as sampling error [93].

If a structural model is fitted in the presence of model misspecifications, researchers intentionally weigh model errors

e_{i j} = s_{i j} - σ_{i j} (γ)

of variables i and j when modeling covariances by choosing a particular loss function. ML estimation only achieves the most efficient estimates if the model is correctly specified. As this will almost always not be true, and it conflicts with the research question to search for a well-fitting SEM, ULS estimation equally weighs model errors, which might be more plausible if all deviations should equally enter the estimation [26]. In a linear regression model, researchers would typically not weigh observations according to the size of residuals in the model. Why should researchers do this in SEM when utilizing ML estimation? Of course, in ULS, all variables should be defined in a comparable metric or previously standardized. If some outlying model errors

e_{i j}

should not impact factor correlations, RME can be used. However, using a different fitting function potentially results in defining a different estimand of interest.

5.3. Why We Do Not Bother about a Violation of Measurement Invariance Due to Model Misspecifications

The presence of cross loadings or residual correlations indicates a violation of measurement variance [97,98]. For example, the measurement model parameters of items

X_{i}

of a factor

F_{X}

are not invariant if cross loadings

δ_{X_{i}}

or residual correlations

ψ_{X_{i} Y_{j}}

differ from zero. In this case, the conditional expectation

E (X_{i} | F_{X}, F_{Y}, Y_{j})

also depends on the factor

F_{Y}

and the item

Y_{j}

. In the violation of measurement invariance, relationships between variables would change if only a subset of items were used for fitting an SEM. We believe that this property has only minor relevance for typical research questions because the questionnaire (and, hence, items) is held fixed in analyses, and the invariance with respect to subsets of items is unimportant. To us, it is unclear why relationships must only appear due to factor variables and not to single items if measurement invariance holds. In contrast, we think that items always define the latent factor [99,100], and testing measurement invariance should not play a role in studying relationships between factors (see also [101,102,103]).

5.4. Why We Should Not Rely on a Factor Model in Two-Step SEM Estimation

As argued above, we favor a two-step estimation approach such as SAM. The first step consists of estimating the parameters of the measurement model, which is typically a one-dimensional factor model. We are less convinced that factor models offer many advantages to using unweighted or weighted sum scores where the weights are apriori specified [102]. In our view, it remains unclear why items should be weighted according to the fit to a factor model when defining a latent variable. If a questionnaire is created targeting a particular construct, we would prefer using an unweighted sum score. For estimating the structural model, the only question is how to determine the reliability of the sum score. The reliability of the sum score in a one-dimensional factor model can be determined using McDonald’s omega [61,104]. Alternatively, one can also compute Revelle’s omega total as the reliability of the sum score based on a multidimensional factor model [105]. However, we think that Cronbach’s alpha [106] or its stratified variants [107] offer conceptual advantages because they do not rely on a factor model. In contrast, they are only based on an exchangeability assumption of items (see also [108,109]), and items can be regarded as random instead of fixed ([110,111]; but see also [112]). Unfortunately, it is frequently noted that the computation of Cronbach’s alpha would require that a one-dimensional factor model must fit the data [113] without referring to the origins of the exchangeability concept behind Cronbach’s alpha [106]. If a sum score and associated reliability are available, structural relationships among latent variables can be estimated using single-indicator models.

We also feel that it is not apparent why the reliability of a score must be necessarily defined on the concept of internal consistency [114]. Moreover, almost all psychometric modeling is devoted to unreliability, which refers to a classical measurement error. However, limited validity typically corresponds to a Berkson error [115] that needs different modeling strategies [116,117].

5.5. Why We Should Model Errors Report as Additional Parameter Uncertainty

Finally, we argued that local model misspecifications could not typically be avoided in SEM applications. In contrast, the attempts to avoid misspecifications by introducing additional model parameters will result in a different meaning of the parameters of interest (i.e., different target estimands), which produces biased parameter estimates. Researchers could still report effect sizes of the model misfit in an SEM [118,119,120,121]. We tend to prefer a different approach that incorporates model errors in the statistical inference of model parameters. The Wu–Browne approach [122] results in increased standard errors in estimated SEM parameters due to model errors due to parameterizing the model error by a particular distribution (see also [123]). We think that this strategy should become standard in practical SEM applications and implementations.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Acknowledgments

I would like to thank three anonymous reviewers, the academic editor, Oliver Lüdtke and Yves Rosseel for numerous constructive comments on a previous version of the article (https://psyarxiv.com/ry8za/; ref. [124]; accessed on 22 June 2022) that helped to improve this paper.

Conflicts of Interest

The author declares no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BBC	bootstrap bias correction
CFA	confirmatory factor analysis
EFA	exploratory factor analysis
GSAM	global stuctural after measurement
LSAM	local stuctural after measurement
ML	maximum likelihood
RL	Rosseel and Loh
RME	robust moment estimation
RMSE	root mean square error
SAM	stuctural after measurement
SCAD	smoothly clipped absolute deviation
SD	standard deviation
SEM	stuctural equation model
ULI	unit loading identification
UVI	unit latent variance identification
ULS	unweighted least squares

Appendix A. Additional Results for Simulation Studies

Table A1 displays the bias and the RMSE of the estimated factor correlation for one negative or two negative residual correlations as a function of sample size for Simulation Study 1.

Table A2 displays the bias and the RMSE of the estimated factor correlation for one negative or two negative cross loadings as a function of sample size for Simulation Study 2.

Table A1. Simulation Study 1 (two factors and residual correlations): bias and root mean square error (RMSE) of estimated factor correlation as a function of sample size with one negative and two negative residual correlations in the data-generating model.

	Bias						RMSE
	$N$						$N$
Method	100	250	500	1000	2500	$10^{5}$	100	250	500	1000	2500	$10^{5}$
	One negative residual correlation
ML	−0.06	−0.05	−0.04	−0.05	−0.05	−0.05	0.23	0.14	0.10	0.08	0.06	0.05
ULS	−0.04	−0.04	−0.04	−0.05	−0.05	−0.05	0.21	0.13	0.09	0.08	0.06	0.05
RME	−0.04	−0.02	−0.01	−0.01	−0.01	0.00	0.23	0.15	0.11	0.07	0.05	0.01
LSAM	−0.20	−0.11	−0.08	−0.07	−0.06	−0.05	0.28	0.17	0.12	0.09	0.07	0.05
GSAM-ML	−0.20	−0.11	−0.08	−0.07	−0.06	−0.05	0.28	0.17	0.12	0.09	0.07	0.05
GSAM-ULS	−0.16	−0.09	−0.07	−0.06	−0.06	−0.05	0.25	0.15	0.11	0.09	0.07	0.05
GSAM-RME	−0.16	−0.08	−0.04	−0.03	−0.01	0.00	0.28	0.17	0.11	0.08	0.05	0.01
Geomin	−0.36	−0.31	−0.29	−0.27	−0.26	−0.27	0.39	0.33	0.30	0.28	0.27	0.27
Lp	−0.37	−0.30	−0.26	−0.22	−0.19	−0.16	0.39	0.33	0.28	0.24	0.20	0.16
Geomin(THR)	−0.35	−0.30	−0.26	−0.25	−0.24	−0.27	0.37	0.31	0.28	0.26	0.26	0.27
Lp(THR)	−0.36	−0.29	−0.25	−0.21	−0.18	−0.16	0.38	0.32	0.27	0.23	0.20	0.16
Geomin(RME)	−0.13	−0.09	−0.08	−0.05	−0.03	−0.01	0.28	0.20	0.14	0.09	0.05	0.01
Lp(RME)	−0.05	0.01	0.03	0.01	0.00	0.00	0.29	0.20	0.15	0.14	0.14	0.03
Geomin(THR,RME)	−0.12	−0.09	−0.07	−0.05	−0.03	−0.01	0.28	0.20	0.14	0.09	0.05	0.01
Lp(THR,RME)	−0.03	0.03	0.04	0.02	0.00	0.00	0.28	0.20	0.15	0.14	0.14	0.03
	Two negative residual correlations
ML	−0.11	−0.10	−0.10	−0.10	−0.10	−0.10	0.25	0.16	0.13	0.12	0.11	0.10
ULS	−0.09	−0.09	−0.10	−0.10	−0.10	−0.10	0.23	0.15	0.13	0.12	0.11	0.10
RME	−0.07	−0.06	−0.04	−0.03	−0.02	0.00	0.25	0.17	0.12	0.08	0.05	0.01
LSAM	−0.23	−0.16	−0.13	−0.12	−0.11	−0.11	0.30	0.20	0.16	0.14	0.12	0.11
GSAM-ML	−0.23	−0.16	−0.13	−0.12	−0.11	−0.11	0.30	0.20	0.16	0.14	0.12	0.11
GSAM-ULS	−0.20	−0.14	−0.12	−0.12	−0.11	−0.11	0.27	0.19	0.15	0.13	0.12	0.11
GSAM-RME	−0.19	−0.11	−0.07	−0.05	−0.02	0.00	0.29	0.19	0.13	0.09	0.05	0.01
Geomin	−0.40	−0.37	−0.34	−0.32	−0.29	−0.21	0.42	0.38	0.35	0.33	0.30	0.21
Lp	−0.38	−0.34	−0.30	−0.26	−0.23	−0.23	0.40	0.36	0.32	0.28	0.24	0.23
Geomin(THR)	−0.38	−0.33	−0.30	−0.28	−0.25	−0.09	0.40	0.35	0.31	0.30	0.27	0.14
Lp(THR)	−0.38	−0.31	−0.27	−0.23	−0.18	−0.04	0.40	0.34	0.30	0.25	0.21	0.08
Geomin(RME)	−0.18	−0.12	−0.09	−0.07	−0.04	0.00	0.32	0.23	0.17	0.12	0.07	0.01
Lp(RME)	−0.07	−0.04	−0.02	0.01	0.01	0.03	0.31	0.22	0.17	0.13	0.10	0.10
Geomin(THR,RME)	−0.16	−0.13	−0.10	−0.08	−0.04	−0.01	0.32	0.23	0.16	0.11	0.06	0.01
Lp(THR,RME)	−0.06	0.00	0.01	0.01	0.00	−0.02	0.29	0.22	0.17	0.14	0.12	0.10

Note. N = sample size; Method = estimation method; ML = maximum likelihood ; ULS = unweighted least squares; RME = robust moment estimation with p = 0.5; SAM = stuctural after measurement; LSAM = local SAM; GSAM = global SAM; Geomin = Geomin factor rotation; Lp = L_p factor rotation with p = 0.5; THR = thresholding error covariance matrix; Geomin(RME) = Geomin factor rotation with subsequent RME; Lp(RME) = Lp factor rotation with subsequent RME; absolute biases larger than 0.05 are printed in bold.

Table A2. Simulation Study 2 (two factors and cross loadings): bias and root mean square error (RMSE) of estimated factor correlation as a function of sample size with one and two negative cross loadings in the data-generating model.

	Bias						RMSE
	$N$						$N$
Method	100	250	500	1000	2500	$10^{5}$	100	250	500	1000	2500	$10^{5}$
	One negative cross-loading
ML	−0.11	−0.08	−0.08	−0.08	−0.08	−0.08	0.26	0.16	0.12	0.10	0.09	0.08
ULS	−0.08	−0.08	−0.08	−0.08	−0.08	−0.08	0.24	0.16	0.12	0.10	0.09	0.08
RME	−0.07	−0.05	−0.04	−0.03	−0.02	0.00	0.25	0.17	0.12	0.08	0.05	0.01
LSAM	−0.25	−0.16	−0.12	−0.10	−0.10	−0.09	0.32	0.21	0.15	0.12	0.11	0.09
GSAM-ML	−0.25	−0.16	−0.12	−0.10	−0.10	−0.09	0.32	0.21	0.15	0.12	0.11	0.09
GSAM-ULS	−0.22	−0.14	−0.12	−0.10	−0.10	−0.10	0.29	0.19	0.15	0.12	0.11	0.10
GSAM-RME	−0.22	−0.13	−0.09	−0.06	−0.03	−0.01	0.32	0.21	0.15	0.10	0.06	0.01
Geomin	−0.39	−0.31	−0.25	−0.21	−0.18	−0.16	0.41	0.33	0.27	0.23	0.18	0.16
Lp	−0.40	−0.33	−0.27	−0.22	−0.17	−0.01	0.42	0.36	0.31	0.28	0.25	0.02
Geomin(THR)	−0.39	−0.32	−0.26	−0.22	−0.18	−0.16	0.40	0.34	0.28	0.23	0.19	0.16
Lp(THR)	−0.40	−0.33	−0.28	−0.22	−0.17	−0.01	0.42	0.36	0.32	0.28	0.25	0.02
Geomin(RME)	−0.18	−0.15	−0.13	−0.12	−0.11	−0.03	0.35	0.29	0.26	0.27	0.26	0.10
Lp(RME)	−0.17	−0.14	−0.15	−0.15	−0.16	−0.14	0.35	0.29	0.30	0.28	0.29	0.19
Geomin(THR,RME)	−0.16	−0.14	−0.13	−0.13	−0.11	−0.03	0.34	0.29	0.27	0.28	0.26	0.10
Lp(THR,RME)	−0.15	−0.14	−0.15	−0.15	−0.16	−0.14	0.35	0.30	0.29	0.27	0.29	0.19
	Two negative cross-loadings
ML	−0.22	−0.19	−0.17	−0.17	−0.16	−0.16	0.37	0.27	0.21	0.19	0.17	0.16
ULS	−0.19	−0.17	−0.16	−0.17	−0.17	−0.17	0.35	0.25	0.20	0.19	0.17	0.17
RME	−0.15	−0.13	−0.09	−0.06	−0.03	0.00	0.33	0.24	0.18	0.12	0.07	0.01
LSAM	−0.36	−0.29	−0.25	−0.22	−0.21	−0.20	0.42	0.33	0.27	0.24	0.21	0.20
GSAM-ML	−0.36	−0.29	−0.25	−0.22	−0.21	−0.20	0.42	0.33	0.27	0.24	0.21	0.20
GSAM-ULS	−0.32	−0.28	−0.25	−0.23	−0.22	−0.21	0.84	0.32	0.27	0.24	0.23	0.21
GSAM-RME	−0.33	−0.27	−0.21	−0.14	−0.08	−0.01	0.76	0.33	0.26	0.19	0.11	0.02
Geomin	−0.44	−0.41	−0.38	−0.36	−0.34	−0.30	0.45	0.42	0.39	0.37	0.34	0.30
Lp	−0.43	−0.41	−0.40	−0.41	−0.46	−0.53	0.45	0.43	0.43	0.45	0.49	0.53
Geomin(THR)	−0.43	−0.41	−0.39	−0.36	−0.34	−0.30	0.45	0.43	0.40	0.38	0.34	0.30
Lp(THR)	−0.43	−0.42	−0.40	−0.41	−0.46	−0.53	0.45	0.43	0.43	0.45	0.49	0.53
Geomin(RME)	−0.26	−0.24	−0.20	−0.12	−0.05	−0.01	0.41	0.34	0.26	0.17	0.10	0.01
Lp(RME)	−0.19	−0.17	−0.17	−0.18	−0.19	−0.07	0.40	0.34	0.32	0.31	0.30	0.20
Geomin(THR,RME)	−0.25	−0.23	−0.20	−0.13	−0.05	−0.01	0.39	0.34	0.27	0.18	0.10	0.01
Lp(THR,RME)	−0.18	−0.16	−0.18	−0.18	−0.19	−0.07	0.41	0.34	0.33	0.32	0.30	0.20

Note. N = sample size; Method = estimation method; ML = maximum likelihood ; ULS = unweighted least squares; RME = robust moment estimation with p = 0.5; SAM = stuctural after measurement; LSAM = local SAM; GSAM = global SAM; Geomin = Geomin factor rotation; Lp = L_p factor rotation with p = 0.5; THR = thresholding error covariance matrix; Geomin(RME) = Geomin factor rotation with subsequent RME; Lp(RME) = Lp factor rotation with subsequent RME.

Appendix B. lavaan Syntax for Model Estimation

In the following, we present lavaan syntax for estimating the two-factor model used in the simulation studies in this article. The model is estimated using a unit latent variance identification (UVI) strategy in which the latent variable is identified by fixing the variance of the latent variable to one while freely estimating all factor loadings.

## define lavaan syntax

lavmodel <- "

FX =~ l1*X1 + l2*X2 + l3*X3

X1 ~~ vX1*X1

X2 ~~ vX2*X2

X3 ~~ vX3*X3

FX ~~ 1*FX

FY =~ l4*Y1 + l5*Y2 + l6*Y3

FY ~~ 1*FY

FX ~~ phi*FY

l1 > 0.01

l2 > 0.01

l3 > 0.01

l4 > 0.01

l5 > 0.01

l6 > 0.01

vX1 > 0.01

vX2 > 0.01

vX3 > 0.01

Y1 ~~ vY1*Y1

Y2 ~~ vY2*Y2

Y3 ~~ vY3*Y3

vY1 > 0.01

vY2 > 0.01

vY3 > 0.01

phi < 0.99

phi > -0.99

"

# estimate SEM using covariance matrix S

mod1 <- lavaan::sem(lavmodel, sample.cov=S, sample.nobs=1e5,

estimator="ML", std.lv=TRUE)

# estimate LSAM

mod2 <- lavaan::sam(lavmodel, sample.cov=S, sample.nobs=1e5,

sam.method="local", std.lv=TRUE,

local.options = list(M.method = "ML" ) )

Appendix C. lavaan Syntax for ULI Estimation in Focused Simulation Study 1A

In the following, we present the lavaan syntax for estimating the two-factor model in Focused Simulation Study 1A.

## define lavaan syntax

lavmodel <- "

FX =~ X1 + l2*X2 + l3*X3

X1 ~~ vX1*X1

X2 ~~ vX2*X2

X3 ~~ vX3*X3

FX ~~ v1*FX

FY =~ Y1 + l5*Y2 + l6*Y3

FY ~~ v2*FY

FX ~~ phi0*FY

v1 > 0.01

l2 > 0.01

l3 > 0.01

v2 > 0.01

l5 > 0.01

l6 > 0.01

vX1 > 0.01

vX2 > 0.01

vX3 > 0.01

Y1 ~~ vY1*Y1

Y2 ~~ vY2*Y2

Y3 ~~ vY3*Y3

phi0*phi0 / (v1*v2) < 0.98

phi0*phi0 / (v1*v2) > -0.98

vY1 > 0.01

vY2 > 0.01

vY3 > 0.01

"

# estimate SEM using covariance matrix S

mod1 <- lavaan::sem(lavmodel, sample.cov=S, sample.nobs=1e5,

estimator="ML", std.lv=FALSE)

# estimate LSAM

mod2 <- lavaan::sam(lavmodel, sample.cov=S, sample.nobs=1e5,

sam.method="local", std.lv=FALSE,

local.options = list(M.method = "ML" ) )

References

Bartholomew, D.J.; Knott, M.; Moustaki, I. Latent Variable Models and Factor Analysis: A Unified Approach; Wiley: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
Basilevsky, A.T. Statistical Factor Analysis and Related Methods: Theory and Applications; Wiley: New York, NY, USA, 2009; Volume 418. [Google Scholar] [CrossRef]
Bollen, K.A. Structural Equations with Latent Variables; John Wiley & Sons: New York, NY, USA, 1989. [Google Scholar] [CrossRef]
Browne, M.W.; Arminger, G. Specification and estimation of mean-and covariance-structure models. In Handbook of Statistical Modeling for the Social and Behavioral Sciences; Arminger, G., Clogg, C.C., Sobel, M.E., Eds.; Springer: Boston, MA, USA, 1995; pp. 185–249. [Google Scholar] [CrossRef]
Jöreskog, K.G. Factor analysis and its extensions. In Factor Analysis at 100; Cudeck, R., MacCallum, R.C., Eds.; Lawrence Erlbaum: Mahwah, NJ, USA, 2007; pp. 47–77. [Google Scholar]
Jöreskog, K.G.; Olsson, U.H.; Wallentin, F.Y. Multivariate Analysis with LISREL; Springer: Basel, Switzerland, 2016. [Google Scholar] [CrossRef]
Mulaik, S.A. Foundations of Factor Analysis; CRC Press: Boca Raton, FL, USA, 2009. [Google Scholar] [CrossRef]
Shapiro, A. Statistical inference of covariance structures. In Current Topics in the Theory and Application of Latent Variable Models; Edwards, M.C., MacCallum, R.C., Eds.; Routledge: London, UK, 2012; pp. 222–240. [Google Scholar] [CrossRef]
Yanai, H.; Ichikawa, M. Factor analysis. In Handbook of Statistics, Volume 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2006; pp. 257–297. [Google Scholar] [CrossRef]
Yuan, K.H.; Bentler, P.M. Structural equation modeling. In Handbook of Statistics, Volume 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; Volume 26, pp. 297–358. [Google Scholar] [CrossRef]
Arminger, G.; Schoenberg, R.J. Pseudo maximum likelihood estimation and a test for misspecification in mean and covariance structure models. Psychometrika 1989, 54, 409–425. [Google Scholar] [CrossRef]
Browne, M.W. Generalized least squares estimators in the analysis of covariance structures. S. Afr. Stat. J. 1974, 8, 1–24. [Google Scholar] [CrossRef]
Curran, P.J.; West, S.G.; Finch, J.F. The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychol. Methods 1996, 1, 16–29. [Google Scholar] [CrossRef]
Yuan, K.H.; Bentler, P.M.; Chan, W. Structural equation modeling with heavy tailed distributions. Psychometrika 2004, 69, 421–436. [Google Scholar] [CrossRef]
Yuan, K.H.; Bentler, P.M. Robust procedures in structural equation modeling. In Handbook of Latent Variable and Related Models; Lee, S.Y., Ed.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 367–397. [Google Scholar] [CrossRef]
Avella Medina, M.; Ronchetti, E. Robust statistics: A selective overview and new directions. WIREs Comput. Stat. 2015, 7, 372–393. [Google Scholar] [CrossRef]
Huber, P.J.; Ronchetti, E.M. Robust Statistics; Wiley: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
Maronna, R.A.; Martin, R.D.; Yohai, V.J. Robust Statistics: Theory and Methods; Wiley: New York, NY, USA, 2006. [Google Scholar] [CrossRef]
Ronchetti, E. The main contributions of robust statistics to statistical science and a new challenge. Metron 2021, 79, 127–135. [Google Scholar] [CrossRef]
Rosseel, Y.; Loh, W.W. A structural after measurement (SAM) approach to structural equation modeling. Psychol. Methods. 2022. Forthcoming. Available online: https://osf.io/pekbm/ (accessed on 28 March 2022).
Briggs, N.E.; MacCallum, R.C. Recovery of weak common factors by maximum likelihood and ordinary least squares estimation. Multivar. Behav. Res. 2003, 38, 25–56. [Google Scholar] [CrossRef]
Cudeck, R.; Browne, M.W. Constructing a covariance matrix that yields a specified minimizer and a specified minimum discrepancy function value. Psychometrika 1992, 57, 357–369. [Google Scholar] [CrossRef]
Kolenikov, S. Biases of parameter estimates in misspecified structural equation models. Sociol. Methodol. 2011, 41, 119–157. [Google Scholar] [CrossRef]
MacCallum, R.C.; Tucker, L.R. Representing sources of error in the common-factor model: Implications for theory and practice. Psychol. Bull. 1991, 109, 502–511. [Google Scholar] [CrossRef]
MacCallum, R.C. 2001 presidential address: Working with imperfect models. Multivar. Behav. Res. 2003, 38, 113–139. [Google Scholar] [CrossRef]
MacCallum, R.C.; Browne, M.W.; Cai, L. Factor analysis models as approximations. In Factor Analysis at 100; Cudeck, R., MacCallum, R.C., Eds.; Lawrence Erlbaum: Mahwah, NJ, USA, 2007; pp. 153–175. [Google Scholar]
Tucker, L.R.; Koopman, R.F.; Linn, R.L. Evaluation of factor analytic research procedures by means of simulated correlation matrices. Psychometrika 1969, 34, 421–459. [Google Scholar] [CrossRef]
Yuan, K.H.; Marshall, L.L.; Bentler, P.M. Assessing the effect of model misspecifications on parameter estimates in structural equation models. Sociol. Methodol. 2003, 33, 241–265. [Google Scholar] [CrossRef]
Boos, D.D.; Stefanski, L.A. Essential Statistical Inference; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
Huber, P.J. Robust estimation of a location parameter. Ann. Math. Stat. 1964, 35, 73–101. [Google Scholar] [CrossRef]
Ronchetti, E. The historical development of robust statistics. In Proceedings of the 7th International Conference on Teaching Statistics (ICOTS-7), Salvador, Brazil, 2–7 July 2006; pp. 2–7. Available online: https://bit.ly/3aueh6z (accessed on 22 June 2022).
Stefanski, L.A.; Boos, D.D. The calculus of M-estimation. Am. Stat. 2002, 56, 29–38. [Google Scholar] [CrossRef]
White, H. Maximum likelihood estimation of misspecified models. Econometrica 1982, 50, 1–25. [Google Scholar] [CrossRef]
Gourieroux, C.; Monfort, A.; Trognon, A. Pseudo maximum likelihood methods: Theory. Econometrica 1984, 52, 681–700. [Google Scholar] [CrossRef]
Wu, J.W. The quasi-likelihood estimation in regression. Ann. Inst. Stat. Math. 1996, 48, 283–294. [Google Scholar] [CrossRef]
Aronow, P.M.; Miller, B.T. Foundations of Agnostic Statistics; Cambridge University Press: Cambridge, UK, 2019. [Google Scholar] [CrossRef]
Robitzsch, A. Estimation methods of the multiple-group one-dimensional factor model: Implied identification constraints in the violation of measurement invariance. Axioms 2022, 11, 119. [Google Scholar] [CrossRef]
Shapiro, A. Statistical inference of moment structures. In Handbook of Latent Variable and Related Models; Lee, S.Y., Ed.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 229–260. [Google Scholar] [CrossRef]
Chun, S.Y.; Shapiro, A. Construction of covariance matrices with a specified discrepancy function minimizer, with application to factor analysis. SIAM J. Matrix Anal. Appl. 2010, 31, 1570–1583. [Google Scholar] [CrossRef] [Green Version]
Savalei, V. Understanding robust corrections in structural equation modeling. Struct. Equ. Model. 2014, 21, 149–160. [Google Scholar] [CrossRef]
Fox, J.; Weisberg, S. Robust Regression in R: An Appendix to an R Companion to Applied Regression, Second Edition. 2010. Available online: https://bit.ly/3canwcw (accessed on 22 June 2022).
Siemsen, E.; Bollen, K.A. Least absolute deviation estimation in structural equation modeling. Sociol. Methods Res. 2007, 36, 227–265. [Google Scholar] [CrossRef]
van Kesteren, E.J.; Oberski, D.L. Flexible extensions to structural equation models using computation graphs. Struct. Equ. Model. 2022, 29, 233–247. [Google Scholar] [CrossRef]
Asparouhov, T.; Muthén, B. Multiple-group factor analysis alignment. Struct. Equ. Model. 2014, 21, 495–508. [Google Scholar] [CrossRef]
Pokropek, A.; Lüdtke, O.; Robitzsch, A. An extension of the invariance alignment method for scale linking. Psych. Test Assess. Model. 2020, 62, 303–334. [Google Scholar]
Robitzsch, A. L_p loss functions in invariance alignment and Haberman linking with few or many groups. Stats 2020, 3, 246–283. [Google Scholar] [CrossRef]
She, Y.; Owen, A.B. Outlier detection using nonconvex penalized regression. J. Am. Stat. Assoc. 2011, 106, 626–639. [Google Scholar] [CrossRef] [Green Version]
Battauz, M. Regularized estimation of the nominal response model. Multivar. Behav. Res. 2020, 55, 811–824. [Google Scholar] [CrossRef]
Oelker, M.R.; Tutz, G. A uniform framework for the combination of penalties in generalized structured models. Adv. Data Anal. Classif. 2017, 11, 97–120. [Google Scholar] [CrossRef]
Fabrigar, L.R.; Wegener, D.T. Exploratory Factor Analysis; Oxford University Press: Oxford, UK, 2011. [Google Scholar] [CrossRef] [Green Version]
Browne, M.W. An overview of analytic rotation in exploratory factor analysis. Multivar. Behav. Res. 2001, 36, 111–150. [Google Scholar] [CrossRef]
Hattori, M.; Zhang, G.; Preacher, K.J. Multiple local solutions and geomin rotation. Multivar. Behav. Res. 2017, 52, 720–731. [Google Scholar] [CrossRef] [PubMed]
Liu, X.; Wallin, G.; Chen, Y.; Moustaki, I. Rotation to sparse loadings using L^p losses and related inference problems. arXiv 2022, arXiv:2206.02263. [Google Scholar]
Jennrich, R.I. Rotation to simple loadings using component loss functions: The oblique case. Psychometrika 2006, 71, 173–191. [Google Scholar] [CrossRef] [Green Version]
Fan, J.; Liao, Y.; Mincheva, M. Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Series B Stat. Methodol. 2013, 75, 603–680. [Google Scholar] [CrossRef] [Green Version]
Bai, J.; Liao, Y. Efficient estimation of approximate factor models via penalized maximum likelihood. J. Econom. 2016, 191, 1–18. [Google Scholar] [CrossRef]
Pati, D.; Bhattacharya, A.; Pillai, N.S.; Dunson, D. Posterior contraction in sparse Bayesian factor models for massive covariance matrices. Ann. Stat. 2014, 42, 1102–1130. [Google Scholar] [CrossRef] [Green Version]
Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Rothman, A.J.; Levina, E.; Zhu, J. Generalized thresholding of large covariance matrices. J. Am. Stat. Assoc. 2009, 104, 177–186. [Google Scholar] [CrossRef]
Scharf, F.; Nestler, S. Should regularization replace simple structure rotation in exploratory factor analysis? Struct. Equ. Model. 2019, 26, 576–590. [Google Scholar] [CrossRef]
McDonald, R.P. Test Theory: A Unified Treatment; Lawrence Erlbaum: Mahwah, NJ, USA, 1999. [Google Scholar] [CrossRef]
Lüdtke, O.; Ulitzsch, E.; Robitzsch, A. A comparison of penalized maximum likelihood estimation and Markov Chain Monte Carlo techniques for estimating confirmatory factor analysis models with small sample sizes. Front. Psychol. 2021, 12, 615162. [Google Scholar] [CrossRef]
Ulitzsch, E.; Lüdtke, O.; Robitzsch, A. Alleviating estimation problems in small sample structural equation modeling—A comparison of constrained maximum likelihood, Bayesian estimation, and fixed reliability approaches. Psychol. Methods 2021. [Google Scholar] [CrossRef] [PubMed]
R Core Team. R: A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2022; Available online: https://www.R-project.org/ (accessed on 11 January 2022).
Rosseel, Y. lavaan: An R package for structural equation modeling. J. Stat. Softw. 2012, 48, 1–36. [Google Scholar] [CrossRef] [Green Version]
Bernaards, C.A.; Jennrich, R.I. Gradient projection algorithms and software for arbitrary rotation criteria in factor analysis. Educ. Psychol. Meas. 2005, 65, 676–696. [Google Scholar] [CrossRef]
Robitzsch, A. sirt: Supplementary Item Response Theory Models. R package version 3.12-66. 2022. Available online: https://CRAN.R-project.org/package=sirt (accessed on 17 May 2022).
Dhaene, S.; Rosseel, Y. Resampling based bias correction for small sample SEM. Struct. Equ. Model. 2022. [Google Scholar] [CrossRef]
Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; CRC Press: Boca Raton, FL, USA, 1994. [Google Scholar] [CrossRef]
Chen, J. Partially confirmatory approach to factor analysis with Bayesian learning: A LAWBL tutorial. Struct. Equ. Model. 2022. [Google Scholar] [CrossRef]
Hirose, K.; Terada, Y. Sparse and simple structure estimation via prenet penalization. Psychometrika 2022. [Google Scholar] [CrossRef] [PubMed]
Huang, P.H.; Chen, H.; Weng, L.J. A penalized likelihood method for structural equation modeling. Psychometrika 2017, 82, 329–354. [Google Scholar] [CrossRef]
Huang, P.H. lslx: Semi-confirmatory structural equation modeling via penalized likelihood. J. Stat. Softw. 2020, 93, 1–37. [Google Scholar] [CrossRef]
Jacobucci, R.; Grimm, K.J.; McArdle, J.J. Regularized structural equation modeling. Struct. Equ. Model. 2016, 23, 555–566. [Google Scholar] [CrossRef]
Li, X.; Jacobucci, R.; Ammerman, B.A. Tutorial on the use of the regsem package in R. Psych 2021, 3, 579–592. [Google Scholar] [CrossRef]
Muthén, B.; Asparouhov, T. Bayesian structural equation modeling: A more flexible representation of substantive theory. Psychol. Methods 2012, 17, 313–335. [Google Scholar] [CrossRef] [PubMed]
Rowe, D.B. Multivariate Bayesian Statistics: Models for Source Separation and Signal Unmixing; Chapman and Hall/CRC: Boca Raton, FL, USA, 2002. [Google Scholar] [CrossRef]
Liang, X. Prior sensitivity in Bayesian structural equation modeling for sparse factor loading structures. Educ. Psychol. Meas. 2020, 80, 1025–1058. [Google Scholar] [CrossRef] [PubMed]
Lodewyckx, T.; Tuerlinckx, F.; Kuppens, P.; Allen, N.B.; Sheeber, L. A hierarchical state space approach to affective dynamics. J. Math. Psychol. 2011, 55, 68–83. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Devlieger, I.; Mayer, A.; Rosseel, Y. Hypothesis testing using factor score regression: A comparison of four methods. Educ. Psychol. Meas. 2016, 76, 741–770. [Google Scholar] [CrossRef] [Green Version]
Devlieger, I.; Talloen, W.; Rosseel, Y. New developments in factor score regression: Fit indices and a model comparison test. Educ. Psychol. Meas. 2019, 79, 1017–1037. [Google Scholar] [CrossRef]
Kelcey, B.; Cox, K.; Dong, N. Croon’s bias-corrected factor score path analysis for small-to moderate-sample multilevel structural equation models. Organ. Res. Methods 2021, 24, 55–77. [Google Scholar] [CrossRef]
Zitzmann, S.; Helm, C. Multilevel analysis of mediation, moderation, and nonlinear effects in small samples, using expected a posteriori estimates of factor scores. Struct. Equ. Model. 2021, 28, 529–546. [Google Scholar] [CrossRef]
Zitzmann, S.; Lohmann, J.F.; Krammer, G.; Helm, C.; Aydin, B.; Hecht, M. A Bayesian EAP-based nonlinear extension of Croon and Van Veldhoven’s model for analyzing data from micro-macro multilevel designs. Mathematics 2022, 10, 842. [Google Scholar] [CrossRef]
Anderson, J.C.; Gerbing, D.W. Structural equation modeling in practice: A review and recommended two-step approach. Psychol. Bull. 1988, 103, 411–423. [Google Scholar] [CrossRef]
Burt, R.S. Interpretational confounding of unobserved variables in structural equation models. Sociol. Methods Res. 1976, 5, 3–52. [Google Scholar] [CrossRef]
Fornell, C.; Yi, Y. Assumptions of the two-step approach to latent variable modeling. Sociol. Methods Res. 1992, 20, 291–320. [Google Scholar] [CrossRef]
McDonald, R.P. Structural models and the art of approximation. Perspect. Psychol. Sci. 2010, 5, 675–686. [Google Scholar] [CrossRef] [PubMed]
Marsh, H.W.; Morin, A.J.S.; Parker, P.D.; Kaur, G. Exploratory structural equation modeling: An integration of the best features of exploratory and confirmatory factor analysis. Annu. Rev. Clin. Psychol. 2014, 10, 85–110. [Google Scholar] [CrossRef] [Green Version]
Brennan, R.L. Misconceptions at the intersection of measurement theory and practice. Educ. Meas. 1998, 17, 5–9. [Google Scholar] [CrossRef]
Uher, J. Psychometrics is not measurement: Unraveling a fundamental misconception in quantitative psychology and the complex network of its underlying fallacies. J. Theor. Philos. Psychol. 2021, 41, 58–84. [Google Scholar] [CrossRef]
Grønneberg, S.; Foldnes, N. Factor analyzing ordinal items requires substantive knowledge of response marginals. Psychol. Methods 2022. [Google Scholar] [CrossRef]
Jorgensen, T.D.; Johnson, A.R. How to derive expected values of structural equation model parameters when treating discrete data as continuous. Struct. Equ. Model. 2022. [Google Scholar] [CrossRef]
Rhemtulla, M.; Brosseau-Liard, P.É.; Savalei, V. When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions. Psychol. Methods 2012, 17, 354–373. [Google Scholar] [CrossRef] [Green Version]
Robitzsch, A. Why ordinal variables can (almost) always be treated as continuous variables: Clarifying assumptions of robust continuous and ordinal factor analysis estimation methods. Front. Educ. 2020, 5, 589965. [Google Scholar] [CrossRef]
Robitzsch, A. On the bias in confirmatory factor analysis when treating discrete variables as ordinal instead of continuous. Axioms 2022, 11, 162. [Google Scholar] [CrossRef]
Davidov, E.; Meuleman, B.; Cieciuch, J.; Schmidt, P.; Billiet, J. Measurement equivalence in cross-national research. Annu. Rev. Sociol. 2014, 40, 55–75. [Google Scholar] [CrossRef]
Millsap, R.E. Statistical Approaches to Measurement Invariance; Routledge: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
VanderWeele, T.J. Constructed measures and causal inference: Towards a new model of measurement for psychosocial constructs. Epidemiology 2022, 33, 141–151. [Google Scholar] [CrossRef]
Westfall, P.H.; Henning, K.S.; Howell, R.D. The effect of error correlation on interfactor correlation in psychometric measurement. Struct. Equ. Model. 2012, 19, 99–117. [Google Scholar] [CrossRef]
Funder, D. Misgivings: Some thoughts about “Measurement Invariance”. 2020. Available online: https://bit.ly/3caKdNN (accessed on 31 January 2020).
Robitzsch, A.; Lüdtke, O. Reflections on analytical choices in the scaling model for test scores in international large-scale assessment studies. PsyArXiv 2021. [Google Scholar] [CrossRef]
Welzel, C.; Inglehart, R.F. Misconceptions of measurement equivalence: Time for a paradigm shift. Comp. Political Stud. 2016, 49, 1068–1094. [Google Scholar] [CrossRef]
McDonald, R.P. The theoretical foundations of principal factor analysis, canonical factor analysis, and alpha factor analysis. Brit. J. Math. Stat. Psychol. 1970, 23, 1–21. [Google Scholar] [CrossRef]
Zinbarg, R.E.; Revelle, W.; Yovel, I.; Li, W. Cronbach’s α, Revelle’s β, and McDonald’s ω_H: Their relations with each other and two alternative conceptualizations of reliability. Psychometrika 2005, 70, 123–133. [Google Scholar] [CrossRef]
Cronbach, L.J. Coefficient alpha and the internal structure of tests. Psychometrika 1951, 16, 297–334. [Google Scholar] [CrossRef] [Green Version]
Cronbach, L.J.; Schönemann, P.; McKie, D. Alpha coefficients for stratified-parallel tests. Educ. Psychol. Meas. 1965, 25, 291–312. [Google Scholar] [CrossRef]
Ellis, J.L. A test can have multiple reliabilities. Psychometrika 2021, 86, 869–876. [Google Scholar] [CrossRef]
Nunnally, J.C.; Bernstein, I.R. Psychometric Theory; Oxford University Press: New York, NY, USA, 1994. [Google Scholar]
Brennan, R.L. Generalizability theory and classical test theory. Appl. Meas. Educ. 2010, 24, 1–21. [Google Scholar] [CrossRef]
Cronbach, L.J.; Shavelson, R.J. My current thoughts on coefficient alpha and successor procedures. Educ. Psychol. Meas. 2004, 64, 391–418. [Google Scholar] [CrossRef]
Tryon, R.C. Reliability and behavior domain validity: Reformulation and historical critique. Psychol. Bull. 1957, 54, 229–249. [Google Scholar] [CrossRef]
McNeish, D. Thanks coefficient alpha, we’ll take it from here. Psychol. Methods 2018, 23, 412–433. [Google Scholar] [CrossRef]
Kane, M. The errors of our ways. J. Educ. Meas. 2011, 48, 12–30. [Google Scholar] [CrossRef]
Carroll, R.J.; Ruppert, D.; Stefanski, L.A.; Crainiceanu, C.M. Measurement Error in Nonlinear Models: A Modern Perspective; Chapman and Hall/CRC: Boca Raton, FL, USA, 2006. [Google Scholar] [CrossRef]
Feldt, L.S. Can validity rise when reliability declines? Appl. Meas. Educ. 1997, 10, 377–387. [Google Scholar] [CrossRef]
Kane, M.T. A sampling model for validity. Appl. Psychol. Meas. 1982, 6, 125–160. [Google Scholar] [CrossRef]
Heene, M.; Hilbert, S.; Draxler, C.; Ziegler, M.; Bühner, M. Masking misfit in confirmatory factor analysis by increasing unique variances: A cautionary note on the usefulness of cutoff values of fit indices. Psychol. Methods 2011, 16, 319–336. [Google Scholar] [CrossRef]
Hu, L.t.; Bentler, P.M. Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Struct. Equ. Model. 1999, 6, 1–55. [Google Scholar] [CrossRef]
McNeish, D.; Wolf, M.G. Dynamic fit index cutoffs for confirmatory factor analysis models. Psychol. Methods 2021. [Google Scholar] [CrossRef]
Moshagen, M. The model size effect in SEM: Inflated goodness-of-fit statistics are due to the size of the covariance matrix. Struct. Equ. Model. 2012, 19, 86–98. [Google Scholar] [CrossRef]
Wu, H.; Browne, M.W. Quantifying adventitious error in a covariance structure as a random effect. Psychometrika 2015, 80, 571–600. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Robitzsch, A.; Dörfler, T.; Pfost, M.; Artelt, C. Die Bedeutung der Itemauswahl und der Modellwahl für die längsschnittliche Erfassung von Kompetenzen [Relevance of item selection and model selection for assessing the development of competencies: The development in reading competence in primary school students]. Z. Entwicklungspsychol. Pädagog. Psychol. 2011, 43, 213–227. [Google Scholar] [CrossRef]
Robitzsch, A. Is it really more robust? Comparing the robustness of the structural after measurement (SAM) approach to structural equation modeling (SEM) against local model misspecifications with alternative estimation approaches. PsyArXiv 2022. [Google Scholar] [CrossRef]

Figure 1. Two-factor CFA model: analysis model.

Figure 2. Two-factor CFA model: Data-generating model with local model misspecifications. Residual correlations are displayed with curved dashed red double arrows. Cross loadings are displayed with dashed red arrows.

Figure 3. Simulation Study 1: Data-generating model. Residual correlations are displayed with curved dashed red double arrows.

Figure 4. Simulation Study 2: Data-generating model. Cross loadings are displayed with dashed red arrows.

Figure 5. Simulation Study 3: Data-generating model. Residual correlations are displayed with curved dashed red double arrows. Cross loadings are displayed with dashed red arrows.

Figure 6. Simulation Study 4: Data-generating model without cross loadings or residual correlations (DGM1).

Figure 7. Simulation Study 4: Definition of the structural model. A common regression coefficient

β

is assumed for all paths.

Figure 7. Simulation Study 4: Definition of the structural model. A common regression coefficient

β

is assumed for all paths.

Figure 8. Simulation Study 4: Data-generating model with cross loadings (DGM2). Cross loadings are displayed with dashed red arrows.

Figure 9. Simulation Study 4: Data-generating model with residual correlations (DGM3). Residual correlations are displayed with curved dashed red double arrows.

Figure 10. Simulation Study 5: Data-generating model. Cross loadings are displayed with dashed red arrows.

Figure 11. Simulation Study 6: Data-generating model. Residual correlations are displayed with curved dashed red double arrows.

Table 1. Simulation Study 1 (two factors and residual correlations): bias and root mean square error (RMSE) of estimated factor correlation as a function of sample size with no, one positive, and two positive residual correlations in the data-generating model.

	Bias						RMSE
	$N$						$N$
Method	100	250	500	1000	2500	$10^{5}$	100	250	500	1000	2500	$10^{5}$
	No residual correlations
ML	−0.01	0.00	0.00	0.00	0.00	0.00	0.21	0.12	0.09	0.06	0.04	0.01
ULS	0.01	0.01	0.01	0.00	0.00	0.00	0.20	0.12	0.08	0.06	0.04	0.01
RME	0.01	0.01	0.01	0.00	0.00	0.00	0.22	0.14	0.10	0.07	0.04	0.01
LSAM	−0.16	−0.07	−0.03	−0.01	−0.01	0.00	0.25	0.14	0.09	0.06	0.04	0.01
GSAM-ML	−0.16	−0.07	−0.03	−0.01	−0.01	0.00	0.25	0.14	0.09	0.06	0.04	0.01
GSAM-ULS	−0.11	−0.05	−0.02	−0.01	0.00	0.00	0.25	0.13	0.09	0.06	0.04	0.01
GSAM-RME	−0.12	−0.05	−0.02	−0.01	0.00	0.00	0.26	0.15	0.11	0.07	0.04	0.01
Geomin	−0.33	−0.24	−0.16	−0.10	−0.06	−0.03	0.35	0.27	0.19	0.12	0.07	0.03
Lp	−0.34	−0.27	−0.20	−0.14	−0.07	0.00	0.37	0.29	0.22	0.16	0.09	0.01
Geomin(THR)	−0.32	−0.24	−0.16	−0.10	−0.06	−0.03	0.34	0.26	0.19	0.12	0.07	0.03
Lp(THR)	−0.33	−0.26	−0.20	−0.14	−0.07	0.00	0.36	0.28	0.22	0.16	0.09	0.01
Geomin(RME)	−0.11	−0.07	−0.04	−0.01	0.00	0.00	0.26	0.17	0.11	0.07	0.04	0.01
Lp(RME)	−0.01	0.06	0.08	0.06	0.03	0.00	0.27	0.19	0.16	0.14	0.09	0.01
Geomin(THR,RME)	−0.10	−0.07	−0.05	−0.02	0.00	0.00	0.27	0.17	0.11	0.07	0.04	0.01
Lp(THR,RME)	0.01	0.06	0.08	0.06	0.03	0.00	0.26	0.19	0.16	0.13	0.09	0.01
	One positive residual correlation
ML	0.05	0.06	0.07	0.07	0.07	0.07	0.21	0.14	0.11	0.09	0.08	0.07
ULS	0.07	0.06	0.06	0.06	0.06	0.06	0.20	0.13	0.10	0.08	0.07	0.06
RME	0.06	0.05	0.04	0.02	0.01	0.00	0.22	0.15	0.11	0.07	0.04	0.01
LSAM	−0.12	−0.02	0.02	0.04	0.05	0.05	0.23	0.13	0.09	0.07	0.06	0.05
GSAM-ML	−0.12	−0.02	0.02	0.04	0.05	0.05	0.23	0.13	0.09	0.07	0.06	0.05
GSAM-ULS	−0.06	0.00	0.03	0.04	0.05	0.05	0.38	0.12	0.09	0.07	0.06	0.05
GSAM-RME	−0.07	−0.02	0.01	0.01	0.01	0.00	0.39	0.15	0.11	0.07	0.04	0.01
Geomin	−0.32	−0.26	−0.20	−0.15	−0.10	−0.02	0.34	0.28	0.22	0.18	0.13	0.03
Lp	−0.33	−0.28	−0.24	−0.19	−0.14	−0.02	0.36	0.30	0.26	0.22	0.17	0.04
Geomin(THR)	−0.31	−0.23	−0.18	−0.12	−0.07	−0.03	0.33	0.26	0.20	0.14	0.09	0.03
Lp(THR)	−0.32	−0.26	−0.22	−0.16	−0.09	0.00	0.35	0.29	0.24	0.18	0.12	0.01
Geomin(RME)	−0.09	−0.05	−0.04	−0.03	−0.02	0.00	0.25	0.18	0.13	0.09	0.06	0.01
Lp(RME)	−0.01	0.04	0.05	0.05	0.03	0.00	0.25	0.19	0.15	0.12	0.07	0.02
Geomin(THR,RME)	−0.08	−0.04	−0.03	−0.01	0.01	0.00	0.25	0.17	0.12	0.08	0.05	0.01
Lp(THR,RME)	0.02	0.06	0.07	0.05	0.03	0.00	0.25	0.19	0.15	0.12	0.07	0.01
	Two positive residual correlations
ML	0.10	0.12	0.12	0.12	0.12	0.12	0.22	0.17	0.14	0.13	0.12	0.12
ULS	0.12	0.12	0.12	0.11	0.11	0.11	0.21	0.16	0.14	0.13	0.12	0.11
RME	0.09	0.08	0.05	0.03	0.01	0.00	0.22	0.16	0.12	0.08	0.04	0.01
LSAM	−0.08	0.03	0.07	0.09	0.10	0.11	0.22	0.13	0.11	0.11	0.11	0.11
GSAM-ML	−0.08	0.03	0.07	0.09	0.10	0.11	0.22	0.13	0.11	0.11	0.11	0.11
GSAM-ULS	−0.03	0.05	0.09	0.10	0.10	0.11	0.19	0.13	0.12	0.11	0.11	0.11
GSAM-RME	−0.04	0.03	0.04	0.03	0.01	0.00	0.24	0.16	0.12	0.08	0.05	0.01
Geomin	−0.32	−0.27	−0.24	−0.24	−0.23	−0.27	0.34	0.29	0.26	0.25	0.25	0.28
Lp	−0.33	−0.29	−0.27	−0.27	−0.27	−0.30	0.36	0.31	0.29	0.29	0.28	0.31
Geomin(THR)	−0.30	−0.23	−0.19	−0.16	−0.13	−0.12	0.33	0.25	0.21	0.19	0.17	0.18
Lp(THR)	−0.32	−0.26	−0.23	−0.21	−0.18	−0.16	0.35	0.29	0.25	0.23	0.21	0.22
Geomin(RME)	−0.09	−0.05	−0.04	−0.03	−0.02	0.00	0.25	0.18	0.14	0.10	0.06	0.01
Lp(RME)	−0.01	0.04	0.05	0.04	0.03	0.01	0.24	0.19	0.16	0.12	0.07	0.01
Geomin(THR,RME)	−0.07	−0.03	−0.02	−0.01	0.00	0.00	0.25	0.18	0.13	0.10	0.06	0.01
Lp(THR,RME)	0.02	0.07	0.07	0.06	0.03	0.02	0.24	0.19	0.16	0.12	0.07	0.07

Note. N = sample size; Method = estimation method; ML = maximum likelihood ; ULS = unweighted least squares; RME = robust moment estimation with p = 0.5; SAM = stuctural after measurement; LSAM = local SAM; GSAM = global SAM; Geomin = Geomin factor rotation; Lp = L_p factor rotation with p = 0.5; THR = thresholding error covariance matrix; Geomin(RME) = Geomin factor rotation with subsequent RME; Lp(RME) = Lp factor rotation with subsequent RME; absolute biases larger than 0.05 are printed in bold.

Table 2. Simulation Study 1 (two factors and residual correlations): standard deviation of estimated factor correlation as a function of sample size with no residual correlations in the data-generating model.

	N
Method	100	250	500	1000	2500	$10^{5}$
ML	0.211	0.123	0.085	0.059	0.037	0.006
ULS	0.198	0.119	0.084	0.058	0.037	0.006
RME	0.220	0.143	0.101	0.068	0.043	0.006
LSAM	0.194	0.125	0.086	0.058	0.037	0.006
GSAM-ML	0.194	0.125	0.086	0.058	0.037	0.006
GSAM-ULS	0.225	0.119	0.084	0.058	0.037	0.006
GSAM-RME	0.228	0.146	0.103	0.069	0.042	0.006
Geomin	0.122	0.107	0.092	0.063	0.038	0.006
Lp	0.133	0.111	0.095	0.077	0.056	0.006
Geomin(THR)	0.124	0.104	0.088	0.064	0.039	0.006
Lp(THR)	0.134	0.109	0.093	0.078	0.057	0.006
Geomin(RME)	0.241	0.156	0.101	0.065	0.040	0.006
Lp(RME)	0.266	0.182	0.144	0.121	0.081	0.007
Geomin(THR,RME)	0.245	0.154	0.101	0.065	0.040	0.006
Lp(THR,RME)	0.263	0.182	0.143	0.120	0.081	0.007

Note. N = sample size; Method = estimation method; ML = maximum likelihood ; ULS = unweighted least squares; RME = robust moment estimation with p = 0.5; SAM = stuctural after measurement; LSAM = local SAM; GSAM = global SAM; Geomin = Geomin factor rotation; Lp = L_p factor rotation with p = 0.5; THR = thresholding error covariance matrix; Geomin(RME) = Geomin factor rotation with subsequent RME; Lp(RME) = Lp factor rotation with subsequent RME.

Table 3. Focused Simulation Study 1A (two factors and residual correlations): bias and root mean square error (RMSE) of estimated factor correlation as a function of sample size with one positive and two positive residual correlations in the data-generating model. for the unit latent variance (UVI) and unit loading identification (ULI).

	Bias						RMSE
	$N$						$N$
Method	100	250	500	1000	2500	$10^{5}$	100	250	500	1000	2500	$10^{5}$
	No residual correlations
ML (UVI)	−0.01	0.00	0.00	0.00	0.00	0.00	0.21	0.12	0.08	0.06	0.04	0.01
ML (ULI)	−0.01	0.00	0.00	0.00	0.00	0.00	0.20	0.12	0.08	0.06	0.04	0.01
ULS (UVI)	0.01	0.00	0.00	0.00	0.00	0.00	0.19	0.12	0.08	0.06	0.04	0.01
ULS (ULI)	0.00	0.00	0.00	0.00	0.00	0.00	0.19	0.12	0.08	0.06	0.04	0.01
LSAM (UVI)	−0.16	−0.07	−0.03	−0.02	−0.01	0.00	0.25	0.15	0.09	0.06	0.04	0.01
LSAM (ULI)	−0.14	−0.07	−0.03	−0.02	−0.01	0.00	0.23	0.14	0.09	0.06	0.04	0.01
	One positive residual correlation
ML (UVI)	0.06	0.06	0.07	0.07	0.07	0.07	0.20	0.14	0.11	0.09	0.08	0.07
ML (ULI)	0.06	0.06	0.07	0.07	0.07	0.07	0.20	0.14	0.11	0.09	0.08	0.07
ULS (UVI)	0.07	0.07	0.07	0.06	0.06	0.06	0.19	0.14	0.11	0.09	0.07	0.06
ULS (ULI)	0.07	0.07	0.07	0.06	0.06	0.06	0.19	0.14	0.11	0.09	0.07	0.06
LSAM (UVI)	−0.12	−0.02	0.03	0.04	0.05	0.05	0.23	0.14	0.09	0.07	0.06	0.05
LSAM (ULI)	−0.09	−0.02	0.03	0.04	0.05	0.05	0.21	0.13	0.09	0.07	0.06	0.05
	Two positive residual correlations
ML (UVI)	0.12	0.12	0.12	0.12	0.11	0.12	0.22	0.17	0.14	0.13	0.12	0.12
ML (ULI)	0.12	0.12	0.12	0.12	0.11	0.12	0.22	0.17	0.14	0.13	0.12	0.12
ULS (UVI)	0.13	0.12	0.11	0.12	0.11	0.11	0.22	0.16	0.14	0.13	0.12	0.11
ULS (ULI)	0.13	0.12	0.11	0.12	0.11	0.11	0.22	0.16	0.14	0.13	0.12	0.11
LSAM (UVI)	−0.08	0.02	0.07	0.09	0.10	0.11	0.22	0.13	0.11	0.11	0.10	0.11
LSAM (ULI)	−0.05	0.03	0.07	0.09	0.10	0.11	0.20	0.13	0.11	0.11	0.10	0.11

Note. N = sample size; Method = estimation method; ML = maximum likelihood; ULS = unweighted least squares; SAM = stuctural after measurement; LSAM = local SAM; absolute biases larger than 0.05 are printed in bold.

Table 4. Focused Simulation Study 1B (two factors and residual correlations): bias for the LSAM estimation approach of estimated factor correlation with a sample size

N = 100

and no residual correlations in the data-generating model as a function of the factor loading

λ

and the true factor correlation

ϕ

.

Table 4. Focused Simulation Study 1B (two factors and residual correlations): bias for the LSAM estimation approach of estimated factor correlation with a sample size

N = 100

and no residual correlations in the data-generating model as a function of the factor loading

λ

and the true factor correlation

ϕ

.

	$ϕ$
$λ$	0.0	0.2	0.4	0.6	0.8
0.4	0.00	−0.09	−0.20	−0.29	−0.39
0.5	0.01	−0.05	−0.10	−0.16	−0.20
0.6	0.00	−0.02	−0.04	−0.07	−0.09
0.7	0.00	0.00	−0.02	−0.03	−0.04
0.8	0.00	0.00	−0.01	−0.02	−0.02

Note. absolute biases larger than 0.05 are printed in bold.

Table 5. Focused Simulation Study 1C (two factors and residual correlations): bias, standard deviation (SD), and root mean square error (RMSE) of estimated factor correlation as a function of sample size with no, one positive and two positive residual correlations in the data-generating model.

	Bias					SD					RMSE
	$N$					$N$					$N$
Method	100	250	500	1000	2500	100	250	500	1000	2500	100	250	500	1000	2500
	No residual correlations
ML	−0.02	−0.01	0.00	0.00	0.00	0.21	0.13	0.08	0.06	0.04	0.21	0.13	0.08	0.06	0.04
ML (BBC)	0.00	0.00	0.00	0.00	0.00	0.25	0.13	0.08	0.06	0.04	0.25	0.13	0.08	0.06	0.04
LSAM	−0.16	−0.07	−0.03	−0.01	0.00	0.19	0.12	0.08	0.06	0.04	0.25	0.14	0.09	0.06	0.04
LSAM (BBC)	−0.08	0.00	0.01	0.00	0.00	0.27	0.15	0.09	0.06	0.04	0.28	0.15	0.09	0.06	0.04
	One positive residual correlation
ML	0.06	0.06	0.07	0.07	0.07	0.21	0.13	0.08	0.06	0.04	0.22	0.14	0.11	0.09	0.08
ML (BBC)	0.09	0.07	0.06	0.07	0.07	0.25	0.14	0.09	0.06	0.04	0.27	0.15	0.11	0.09	0.08
LSAM	−0.11	−0.03	0.02	0.04	0.05	0.20	0.13	0.09	0.06	0.04	0.23	0.14	0.09	0.07	0.06
LSAM (BBC)	−0.03	0.04	0.06	0.06	0.06	0.29	0.16	0.10	0.06	0.04	0.29	0.17	0.11	0.08	0.07
	Two positive residual correlations
ML	0.10	0.11	0.12	0.11	0.12	0.20	0.13	0.08	0.06	0.04	0.22	0.17	0.14	0.13	0.12
ML (BBC)	0.14	0.12	0.12	0.11	0.12	0.24	0.13	0.08	0.06	0.04	0.27	0.18	0.14	0.13	0.12
LSAM	−0.09	0.02	0.07	0.09	0.10	0.21	0.14	0.09	0.06	0.04	0.23	0.14	0.11	0.10	0.11
LSAM (BBC)	0.01	0.10	0.11	0.11	0.11	0.30	0.17	0.10	0.06	0.04	0.30	0.20	0.14	0.12	0.11

Note. N = sample size; Method = estimation method; ML = maximum likelihood; SAM = stuctural after measurement; LSAM = local SAM; BBC = bootstrap bias correction; absolute biases larger than 0.05 are printed in bold.

Table 6. Simulation Study 2 (two factors and cross loadings): bias and root mean square error (RMSE) of estimated factor correlation as a function of sample size with one and two positive cross loadings in the data-generating model.

	Bias						RMSE
	$N$						$N$
Method	100	250	500	1000	2500	$10^{5}$	100	250	500	1000	2500	$10^{5}$
	One positive cross-loading
ML	0.12	0.13	0.13	0.12	0.13	0.12	0.21	0.17	0.15	0.13	0.13	0.12
ULS	0.13	0.13	0.12	0.12	0.12	0.12	0.21	0.17	0.14	0.13	0.12	0.12
RME	0.13	0.13	0.13	0.12	0.12	0.12	0.22	0.18	0.16	0.14	0.13	0.12
LSAM	−0.07	0.05	0.09	0.10	0.11	0.12	0.21	0.14	0.12	0.11	0.12	0.12
GSAM-ML	−0.07	0.05	0.09	0.10	0.11	0.12	0.21	0.14	0.12	0.11	0.12	0.12
GSAM-ULS	−0.01	0.08	0.10	0.11	0.12	0.12	0.19	0.14	0.13	0.12	0.12	0.12
GSAM-RME	−0.03	0.06	0.08	0.08	0.05	0.01	0.22	0.16	0.13	0.11	0.08	0.01
Geomin	−0.26	−0.18	−0.11	−0.05	−0.02	0.00	0.29	0.21	0.14	0.08	0.04	0.01
Lp	−0.30	−0.22	−0.16	−0.12	−0.07	0.00	0.33	0.25	0.19	0.14	0.09	0.01
Geomin(THR)	−0.26	−0.18	−0.11	−0.06	−0.02	0.00	0.29	0.21	0.14	0.09	0.04	0.01
Lp(THR)	−0.29	−0.23	−0.17	−0.13	−0.07	0.00	0.32	0.26	0.20	0.15	0.09	0.01
Geomin(RME)	−0.03	0.00	0.00	0.00	0.00	0.00	0.21	0.16	0.13	0.09	0.05	0.01
Lp(RME)	0.06	0.10	0.10	0.10	0.09	0.08	0.24	0.20	0.17	0.14	0.12	0.10
Geomin(THR,RME)	−0.03	0.01	0.01	0.00	0.00	0.00	0.22	0.17	0.13	0.09	0.05	0.01
Lp(THR,RME)	0.07	0.10	0.10	0.10	0.09	0.08	0.23	0.20	0.17	0.15	0.12	0.10
	Two positive cross-loadings
ML	0.22	0.24	0.24	0.24	0.24	0.24	0.25	0.25	0.24	0.24	0.24	0.24
ULS	0.22	0.23	0.23	0.23	0.23	0.23	0.25	0.25	0.24	0.23	0.23	0.23
RME	0.22	0.23	0.24	0.24	0.25	0.22	0.26	0.26	0.25	0.25	0.25	0.22
LSAM	0.07	0.17	0.21	0.23	0.24	0.25	0.19	0.20	0.22	0.24	0.24	0.25
GSAM-ML	0.07	0.17	0.21	0.23	0.24	0.25	0.19	0.20	0.22	0.24	0.24	0.25
GSAM-ULS	0.14	0.21	0.23	0.25	0.25	0.26	0.21	0.23	0.24	0.25	0.26	0.26
GSAM-RME	0.11	0.18	0.20	0.21	0.24	0.27	0.24	0.22	0.23	0.23	0.25	0.27
Geomin	−0.21	−0.14	−0.07	−0.02	0.02	0.03	0.24	0.18	0.13	0.09	0.05	0.03
Lp	−0.25	−0.18	−0.13	−0.10	−0.07	0.00	0.28	0.22	0.17	0.13	0.09	0.01
Geomin(THR)	−0.21	−0.14	−0.08	−0.02	0.02	0.03	0.25	0.18	0.13	0.09	0.05	0.03
Lp(THR)	−0.25	−0.19	−0.14	−0.10	−0.07	0.00	0.29	0.22	0.18	0.13	0.09	0.01
Geomin(RME)	0.04	0.08	0.08	0.06	0.02	0.00	0.22	0.18	0.16	0.13	0.08	0.01
Lp(RME)	0.12	0.17	0.18	0.19	0.19	0.18	0.22	0.22	0.21	0.21	0.20	0.20
Geomin(THR,RME)	0.04	0.09	0.09	0.06	0.02	0.00	0.22	0.19	0.16	0.13	0.08	0.01
Lp(THR,RME)	0.13	0.17	0.17	0.19	0.18	0.18	0.23	0.23	0.21	0.21	0.20	0.20

Note. N = sample size; Method = estimation method; ML = maximum likelihood ; ULS = unweighted least squares; RME = robust moment estimation with p = 0.5; SAM = stuctural after measurement; LSAM = local SAM; GSAM = global SAM; Geomin = Geomin factor rotation; Lp = L_p factor rotation with p = 0.5; THR = thresholding error covariance matrix; Geomin(RME) = Geomin factor rotation with subsequent RME; Lp(RME) = Lp factor rotation with subsequent RME; absolute biases larger than 0.05 are printed in bold.

Table 7. Simulation Study 3 (two factors and one residual correlation and one cross loading): bias and root mean square error (RMSE) of estimated factor correlation as a function of sample size with one positive residual correlation and one positive cross loading in the data-generating model.

	Bias						RMSE
	$N$						$N$
Method	100	250	500	1000	2500	$10^{5}$	100	250	500	1000	2500	$10^{5}$
ML	0.16	0.17	0.17	0.17	0.17	0.17	0.23	0.20	0.19	0.18	0.17	0.17
ULS	0.17	0.16	0.17	0.16	0.17	0.16	0.23	0.19	0.18	0.17	0.17	0.16
RME	0.16	0.15	0.15	0.14	0.13	0.12	0.24	0.20	0.18	0.15	0.14	0.12
LSAM	−0.02	0.09	0.13	0.14	0.16	0.16	0.20	0.15	0.15	0.16	0.16	0.16
GSAM-ML	−0.02	0.09	0.13	0.14	0.16	0.16	0.20	0.15	0.15	0.16	0.16	0.16
GSAM-ULS	0.03	0.11	0.14	0.15	0.16	0.17	0.19	0.16	0.16	0.16	0.17	0.17
GSAM-RME	0.02	0.10	0.13	0.12	0.12	0.06	0.22	0.18	0.17	0.15	0.14	0.11
Geomin	−0.26	−0.19	−0.15	−0.11	−0.06	0.03	0.29	0.22	0.18	0.14	0.09	0.03
Lp	−0.29	−0.24	−0.20	−0.17	−0.12	−0.05	0.32	0.26	0.22	0.19	0.14	0.05
Geomin(THR)	−0.25	−0.19	−0.14	−0.08	−0.04	0.00	0.28	0.21	0.16	0.11	0.07	0.01
Lp(THR)	−0.29	−0.23	−0.19	−0.15	−0.10	0.00	0.32	0.26	0.21	0.17	0.12	0.01
Geomin(RME)	−0.03	0.01	0.03	0.04	0.03	0.00	0.22	0.17	0.14	0.12	0.09	0.01
Lp(RME)	0.08	0.13	0.13	0.12	0.13	0.16	0.23	0.20	0.18	0.16	0.16	0.17
Geomin(THR,RME)	−0.01	0.03	0.04	0.04	0.03	0.00	0.22	0.17	0.14	0.11	0.08	0.01
Lp(THR,RME)	0.09	0.13	0.13	0.11	0.11	0.10	0.24	0.21	0.19	0.16	0.15	0.13

Note. N = sample size; Method = estimation method; ML = maximum likelihood ; ULS = unweighted least squares; RME = robust moment estimation with p = 0.5; SAM = stuctural after measurement; LSAM = local SAM; GSAM = global SAM; Geomin = Geomin factor rotation; Lp = L_p factor rotation with p = 0.5; THR = thresholding error covariance matrix; Geomin(RME) = Geomin factor rotation with subsequent RME; Lp(RME) = Lp factor rotation with subsequent RME; absolute biases larger than 0.05 are printed in bold.

Table 8. Simulation Study 4 (five factors): Average absolute bias and average root mean square error (RMSE) of estimated factor correlation as a function of sample size for the three data-generating models DGM1, DGM2, and DGM3.

	Bias						RMSE
	$N$						$N$
Method	100	250	500	1000	2500	$10^{5}$	100	250	500	1000	2500	$10^{5}$
	Correctly specified model (DGM1)
ML	0.00	0.00	0.00	0.00	0.00	0.00	0.15	0.09	0.06	0.05	0.03	0.00
ULS	0.00	0.00	0.00	0.00	0.00	0.00	0.15	0.09	0.07	0.05	0.03	0.00
LSAM	0.01	0.00	0.00	0.00	0.00	0.00	0.14	0.09	0.06	0.04	0.03	0.00
RME	0.00	0.00	0.00	0.00	0.00	0.00	0.16	0.10	0.07	0.05	0.03	0.00
	Cross-Loadings (DGM2)
ML	0.08	0.09	0.09	0.09	0.09	0.09	0.20	0.16	0.13	0.12	0.10	0.09
ULS	0.11	0.12	0.12	0.12	0.12	0.12	0.22	0.17	0.15	0.14	0.13	0.12
LSAM	0.05	0.05	0.06	0.06	0.06	0.06	0.16	0.12	0.10	0.08	0.07	0.06
RME	0.08	0.06	0.06	0.05	0.05	0.03	0.21	0.15	0.12	0.10	0.08	0.04
	Residual correlations (DGM3)
ML	0.25	0.27	0.27	0.27	0.26	0.21	0.29	0.28	0.28	0.27	0.27	0.21
ULS	0.14	0.14	0.14	0.14	0.14	0.14	0.21	0.17	0.15	0.15	0.14	0.14
LSAM	0.15	0.16	0.17	0.17	0.17	0.17	0.21	0.19	0.18	0.17	0.17	0.17
RME	0.06	0.02	0.01	0.01	0.01	0.00	0.18	0.11	0.07	0.05	0.03	0.01

Note. N = sample size; Method = estimation method; ML = maximum likelihood; ULS = unweighted least squares; SAM = stuctural after measurement; LSAM = local SAM; RME = robust moment estimation; absolute biases larger than 0.05 are printed in bold.

Table 9. Focused Simulation Study 4A (five factors): Average absolute bias of estimated factor correlation as a function of the common regression parameter

β

and the three data-generating models DGM1, DGM2, and DGM3 at the population level.

Table 9. Focused Simulation Study 4A (five factors): Average absolute bias of estimated factor correlation as a function of the common regression parameter

β

and the three data-generating models DGM1, DGM2, and DGM3 at the population level.

	$β$
Method	0.1	0.2	0.3	0.4
	Correctly specified model (DGM1)
ML	0.00	0.00	0.00	0.00
ULS	0.00	0.00	0.00	0.00
LSAM	0.00	0.00	0.00	0.00
RME	0.00	0.00	0.00	0.00
	Cross-Loadings (DGM2)
ML	0.09	0.11	0.13	0.13
ULS	0.12	0.13	0.14	0.15
LSAM	0.06	0.07	0.08	0.08
RME	0.02	0.09	0.12	0.15
	Residual correlations (DGM3)
ML	0.20	0.20	0.18	0.16
ULS	0.14	0.13	0.12	0.10
LSAM	0.17	0.16	0.15	0.13
RME	0.00	0.00	0.00	0.00

Note. N = sample size; Method = estimation method; ML = maximum likelihood; ULS = unweighted least squares; SAM = stuctural after measurement; LSAM = local SAM; RME = robust moment estimation; absolute biases larger than 0.05 are printed in bold.

Table 10. Simulation Study 5 (three factors and cross loadings): percentages of factor correlation estimates in data constellations in which SAM and SEM are unbiased (“no bias”), SAM is preferred over SEM (“SAM better”), SEM is preferred over SAM (“SEM better”), and both methods are biased (“both biased”).

	All Factor Correlations				All Factor Correlations $\leq 0.20$				All Factor Correlations $\geq 0.30$
$δ_{3}$	0	0	0.4	0.4	0	0	0.4	0.4	0	0	0.4	0.4
$δ_{2}$	0	0.4	0	0.4	0	0.4	0	0.4	0	0.4	0	0.4
No bias	47.4	14.5	13.3	0.0	63.0	29.6	29.6	0.0	47.4	14.1	11.5	0.0
SAM better	9.4	6.7	7.0	1.7	14.8	13.6	12.3	4.9	0.0	0.0	0.0	0.0
SEM better	4.9	5.8	11.9	3.2	0.0	0.0	0.0	0.0	13.0	13.0	24.5	3.1
Both biased	38.3	73.0	67.8	95.0	22.2	56.8	58.0	95.1	39.6	72.9	64.1	96.9

Note. SEM was estimated using unweighted least squares (ULS). SAM was estimated using LSAM.

Table 11. Simulation Study 6 (three factors and residual correlations): percentages of factor correlation estimates in data constellations in which SAM and SEM are unbiased (“no bias”), SAM is preferred over SEM (“SAM better”), SEM is preferred over SAM (“SEM better”), and both methods are biased (“both biased”).

	All Factor Correlations		All Factor Correlations $\leq 0.20$		All Factor Correlations $\geq 0.30$
$ψ$	0.2	0.4	0.2	0.4	0.2	0.4
No bias	33.3	33.3	33.3	33.3	33.3	33.3
SAM better	0.0	3.1	0.0	4.9	0.0	0.0
SEM better	0.0	0.0	0.0	0.0	0.0	0.0
Both biased	66.7	63.6	66.7	61.7	66.7	66.7

Note. SEM was estimated using unweighted least squares (ULS). SAM was estimated using LSAM.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Robitzsch, A. Comparing the Robustness of the Structural after Measurement (SAM) Approach to Structural Equation Modeling (SEM) against Local Model Misspecifications with Alternative Estimation Approaches. Stats 2022, 5, 631-672. https://doi.org/10.3390/stats5030039

AMA Style

Robitzsch A. Comparing the Robustness of the Structural after Measurement (SAM) Approach to Structural Equation Modeling (SEM) against Local Model Misspecifications with Alternative Estimation Approaches. Stats. 2022; 5(3):631-672. https://doi.org/10.3390/stats5030039

Chicago/Turabian Style

Robitzsch, Alexander. 2022. "Comparing the Robustness of the Structural after Measurement (SAM) Approach to Structural Equation Modeling (SEM) against Local Model Misspecifications with Alternative Estimation Approaches" Stats 5, no. 3: 631-672. https://doi.org/10.3390/stats5030039

Article Menu

Comparing the Robustness of the Structural after Measurement (SAM) Approach to Structural Equation Modeling (SEM) against Local Model Misspecifications with Alternative Estimation Approaches

Abstract

1. Introduction

2. Estimation under Local Model Misspecification

2.1. Numerical Analysis of Sensitivity to Model Misspecifications

2.2. Two-Factor CFA Model with Local Model Misspecifications

2.3. Nonrobustness of the LSAM Approach

2.4. Comparison of ULS and GSAM in a Congeneric Measurement Model

2.5. Equivalence of ULS and SAM for the Tau-Equivalent Measurement Model

3. Alternative Model-Robust Estimation Approaches

3.1. Robust Moment Estimation (RME)

3.2. GSAM with Robust Moment Estimation (GSAM-RME)

3.3. Factor Rotation with Thresholding the Error Correlation Matrix

4. Simulation Studies

4.1. Simulation Study 1: Correlated Residual Errors in the Two-Factor Model

4.1.1. Method

4.1.2. Results

4.1.3. Focused Simulation Study 1A: Choice of Model Identification

4.1.4. Focused Simulation Study 1B: Investigating the Small-Sample Bias of LSAM

4.1.5. Focused Simulation Study 1C: Bootstrap Bias Correction of the LSAM Method

4.2. Simulation Study 2: Cross Loadings in the Two-Factor Model

4.2.1. Method

4.2.2. Results

4.3. Simulation Study 3: Correlated Residual Errors and Cross Loadings in the Two-Factor Model

4.3.1. Method

4.3.2. Results

4.4. Simulation Study 4: A Five-Factor Model Very Close to the Rosseel-Loh Simulation Study

4.4.1. Method

4.4.2. Results

4.4.3. Focused Simulation Study 4A: Varying the Size of Factor Correlations

4.5. Simulation Study 5: Comparing SAM and SEM in a Three-Factor Model with Cross Loadings

4.6. Simulation Study 6: Comparing SAM and SEM in a Three-Factor Model with Residual Correlations

5. Discussion

5.1. Regularized Estimation and Misspecified Models

5.2. Why the SAM Approach Should Generally Be Preferred over SEM

5.3. Why We Do Not Bother about a Violation of Measurement Invariance Due to Model Misspecifications

5.4. Why We Should Not Rely on a Factor Model in Two-Step SEM Estimation

5.5. Why We Should Model Errors Report as Additional Parameter Uncertainty

Funding

Institutional Review Board Statement

Informed Consent Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Additional Results for Simulation Studies

Appendix B. lavaan Syntax for Model Estimation

Appendix C. lavaan Syntax for ULI Estimation in Focused Simulation Study 1A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI