Segmentation in Structural Equation Modeling Using a Combination of Partial Least Squares and Modified Fuzzy Clustering

Mukid, Moch Abdul; Otok, Bambang Widjanarko; Suharsono, Agus

doi:10.3390/sym14112431

Open AccessArticle

Segmentation in Structural Equation Modeling Using a Combination of Partial Least Squares and Modified Fuzzy Clustering

by

Moch Abdul Mukid

^1,2

,

Bambang Widjanarko Otok

^1,*

and

Agus Suharsono

¹

Department of Statistics, Faculty of Science and Data Analytics, Institute Teknologi Sepuluh Nopember, Jl. Arif Rahman Hakim, Surabaya 60111, Indonesia

²

Department of Statistics, Faculty of Science and Mathematics, Diponegoro University, Jl. Prof. Sudarto No.13, Tembalang, Semarang 50275, Indonesia

^*

Author to whom correspondence should be addressed.

Symmetry 2022, 14(11), 2431; https://doi.org/10.3390/sym14112431

Submission received: 30 August 2022 / Revised: 31 October 2022 / Accepted: 9 November 2022 / Published: 16 November 2022

Download

Browse Figures

Versions Notes

Abstract

:

The application of a structural equation modeling (SEM) assumes that all data follow only one model. This assumption may be inaccurate in certain cases because individuals tend to differ in their responses, and failure to consider heterogeneity may threaten the validity of the SEM results. This study focuses on unobservable heterogeneity, where the difference between two or more data sets does not depend on observable characteristics. In this study, we propose a new method for estimating SEM parameters containing unobserved heterogeneity within the data and assume that the heterogeneity arises from the outer model and inner model. The method combines partial least squares (PLS) and modified fuzzy clustering. Initially, each observation was randomly assigned weights in each selected segment. These weights continued to be iteratively updated using a specific objective function. The sum of the weighted residual squares resulting from the outer and inner models of PLS-SEM is an objective function that must be minimized. We then conducted a simulation study to evaluate the performance of the method by considering various factors, including the number of segments, model specifications, residual variance of endogenous latent variables, residual variance of indicators, population size, and distribution of latent variables. From the simulation study and its application to the actual data, we conclude that the proposed method can classify observations into correct segments and precisely predict SEM parameters in each segment.

Keywords:

unobserved heterogeneity; clusterwise PLS; path modeling

1. Introduction

Structural equation modeling (SEM) is a statistical method that is used to estimate a causal relationship, specifically connecting two or more latent concepts, each of which is measured through several indicator variables [1]. SEM has helped solve many substantive problems in the social and cultural sciences [2]. This model is currently widely used in marketing [3,4,5] and continues to expand to other areas such as sociology [6], education [7,8,9,10], health [11], management [12,13,14,15,16], and environment [17].

There are two different approaches for estimating the parameters of SEM, namely the covariance matrix structure-based approach [18] and the component-based approach, which is also known as partial least squares (PLS) modeling [19]. In the covariance-based approach, the maximum likelihood method is normally used for parameter estimation, whereby the indicators are assumed to have normal multivariate distribution [2,18]. The PLS approach in SEM then becomes an alternative if the normal multivariate assumption of the indicators is not fulfilled. PLS focuses on estimating the latent variable scores that differ from the covariance-based approach [20].

In applying SEM, it is assumed that the data are homogenous and follow only one model in a term. This assumption may be inaccurate because individuals tend to vary in their responses. If there are significant differences between specific segment parameters and other segments, the use of a single model in the aggregated data can be very misleading [21]. Failure to consider such heterogeneity can threaten the validity of the SEM results, leading to wrong conclusions. The discovery of segments in the data that are characterized by different SEM models will provide wider information on the results of the analysis. The information obtained is not only derived from aggregated data but also from each segment. There are two types of heterogeneity: observed heterogeneity and unobserved heterogeneity. Unobserved heterogeneity arises when the difference between two or more data groups does not depend on the observable characteristics [22]. This unobserved heterogeneity may be due to the differences in the measurement and structural models. Unobserved heterogeneity is one of the problems faced by social researchers [23]. In situations where unobserved heterogeneity exists, the researcher cannot generalize the results from the aggregate-level data analysis, but must account for differences in model relationships by establishing adequate observational segments [24]. Disclosure of this unobserved heterogeneity is a requirement to obtain valid results when using SEM modeling. Conventional segmentation methods usually fail in SEM because they only consider data from indicators and ignore the relationships between latent variables [25].

Within the context of SEM, several researchers have proposed segmentation techniques to overcome this heterogeneity, such as finite mixture SEM [23], finite mixture PLS [21,22,24], PLS typological regression [26], response-based unit segmentation PLS (REBUS-PLS) [27], and hierarchical Bayesian SEM [28]. In addition, a segmentation method based on the PLS genetic algorithm (PLS GAS), which uses guided random searching to find an optimal solution in a complex search space, was proposed by Ringle et al. [29]. There is also the PLS iterative reweighted regression segmentation (PLS IRRS) method, in which the M estimator inspires to reduce the effect of outliers in the regression model [25].

Reviewing these segmentation techniques reveals weaknesses that limit their application in many research situations. For example, finite mixture PLS [21] only considers heterogeneity in the structural model and imposes distribution assumptions on endogenous latent variables, which is contrary to the nonparametric character of PLS path modeling. As an extension of the PLS typological path modeling, the REBUS-PLS response-based unit segmentation procedure overcomes some of the limitations of FIMIX-PLS [27]. REBUS-PLS, however, has a weakness. For example, REBUS-PLS can only be used for reflective measurement models. Moreover, REBUS-PLS determines the initial partition and the number of segments using the hierarchical grouping method on the structural model residuals and measurements at the aggregate data level. This initial step is problematic because, for large amounts of data, it is not easy to interpret the segment number of the obtained dendrogram [30]. In addition, PLS GAS has some weaknesses regarding the time it takes to run the algorithm. PLS GAS takes more than one hour for calculations on simple models, whereas for more complex models, it can take longer [25]. Meanwhile, PLS IRRS assumes, similarly to FIMIX PLS, that unobserved heterogeneity can be explained only in the structural model.

We introduce a new method for SEM modeling within data containing unobserved heterogeneity by combining PLS and a modified fuzzy clustering method. In particular, PLS is combined with fuzzy clustering in an integrated framework. At the initial stage, each object was assigned a random initial weight in each segment. Through an iterative process, the weights were updated until convergence was reached. We have developed a new formula for this weighting, which considers heterogeneity in the structural and measurement models. Fuzzy clustering is an overlapping method that allows an object to be part of several segments [31,32,33]. The reasons for using fuzzy clustering to reveal data heterogeneity according to the SEM framework can be seen in [34]. The success of fuzzy clustering in grouping data motivates us to study the use of fuzzy clustering in PLS-SEM modeling. The use of fuzzy clustering in the context of finding data groups in PLS-SEM modeling has not previously been reported. The method of combining the fuzzy clustering method and the PLS method so that it can be used to reveal heterogeneity in the SEM model framework was investgated. Adapting fuzzy clustering in PLS-SEM modeling was conducted by modifications within the objective function and the procedure for updating the fuzzy membership value. This kind of fuzzy clustering continues to be developed, which provides an opportunity to develop this method with the latest fuzzy clustering. Examples of studies on recent fuzzy clustering can be seen in [30,35,36,37,38].

This paper is structured as follows: in Section 2, we provide a brief background on the PLS approach to estimate SEM parameters and the algebraic notation in which PLS-SEM is defined. In Section 3, the basic formulas of the PLSMFC model and a summary of the PLSMFC algorithm are presented. Section 4 reviews the results and discusses the performance of the PLSMFC, which was evaluated in a detailed simulation study and empirical application. Finally, Section 5 provides conclusions regarding the feasibility of our proposed method.

2. Partial Least Squares Method for Estimating SEM Parameters

The PLS approach to the SEM has been proposed as a component-based estimation procedure that differs from the classical covariance-based approach [27]. It is an iterative algorithm that breaks down the blocks of the measurement model and estimates the path coefficients in the structural model in the next step. Therefore, PLS-SEM is claimed to explain the residual variance of the latent variables and indicator variables in each regression run in the model, which is why PLS path modeling is considered more of an exploratory than confirmatory approach. In contrast to the classical covariance-based approach, PLS-SEM does not aim to obtain a sample covariance matrix.

PLS-SEM is considered a soft modeling approach, where no strict assumptions are required. This is a desirable feature, especially in application studies where such assumptions are difficult to fulfill [39]. The PLS method was built from a system of interdependent equations based on simple or multiple regression. Such a system estimates the network of relationships between latent variables and the relationship between indicator variables and the latent variables associated with them. Lohmöller simplified symbols to facilitate PLS algorithm preparation [19]. Exogenous and endogenous latent variables are given the same symbol, i.e.,

η

. The indicators of the endogenous and exogenous latent variables are also given the same symbol, namely, x. Let X_kj be a data matrix of size Nxp, where N is the number of observations and p is the number of indicator variables. Furthermore, the indicators are partitioned into J subsets that do not overlap or are called blocks, namely X₁, X₂, …, X_J. Each block represents a latent variable

η_{j}

; j = 1, 2, …, J. Each block has K_j indicators

x_{k_{j}}

with k_j = 1, 2, …, K_j. Latent variables are assumed to be connected by one or more linear relations. All variables, both latent and indicators, are treated as standardized variables. In the context of PLS, the structural model is often called the inner model. The structural relationship associated with the j* endogenous latent variable in mathematical notation is written as

η_{j^{*} n} = \sum_{i = 1}^{Q_{j^{*}}} β_{j^{*} i} η_{i n} + ζ_{j^{*} n} j^{*} = 1, 2, \dots J^{*} n = 1, 2, \dots, N

(1)

Q_{j^{*}}

is the number of latent variables related to the j* endogenous latent variable. In vector and matrix, the Equation (1) can be expressed by

η_{j^{*}} = η_{\to j^{*}} β_{j^{*}} + ζ_{j^{*}}

(2)

The coefficient

β_{j^{*}}

is a path coefficient vector that represents the strength and direction of the relationship between the response

η_{j^{*}}

and the predictor

η_{\to j^{*}}

.

ζ_{j^{*}}

is the residual term of the inner model.

The measurement model in PLS is often called the outer model, which includes two measurement models: the reflective and the formative. The reflective measurement model is the most commonly used. In this case, the latent variable is considered the indicator’s cause. It is called reflective because the indicators ‘reflect’ the latent variable. The formative measurement model assumes that the indicator variable is the cause of the latent variable.

Next, we describe the PLS algorithm for estimating SEM parameters. As a tool for estimating model parameters, the latent variable score

Y_{j}

is estimated first through the mechanism of the weighted sum of the indicator variables as in Equation (3)

Y_{j} = \sum_{k_{j} = 1}^{K_{j}} w_{k_{j}} x_{k_{j}}

(3)

where

w_{k_{j}}

is the outer weight related to indicator

x_{k_{j}}

. The weights are estimated using the least squares method. There are two versions of this weight estimation process. The first version is that the indicator variables are regressed on an instrumental variable

\tilde{Y_{j}}

. This version is called mode A.

x_{k_{j}} = {\tilde{w}}_{k_{j}} {\tilde{Y}}_{j} + e_{k_{j}}

(4)

The estimated weight value is obtained by minimizing the residual values. The second version, which is called mode B, is the instrumental variable

\tilde{Y_{j}}

regressed against the indicator variables.

{\tilde{Y}}_{j} = \sum_{k_{j} = 1}^{K_{j}} {\tilde{w}}_{k_{j}} x_{k_{j}} + d_{j}

(5)

The weights of

w_{k_{j}}

in Equation (3) are rescaled from

{\tilde{w}}_{k_{j}}

. The process causes the latent variable score to have a variance of 1. The following is a summary of the basic algorithm (Algorithm 1) of PLS [19].

Algorithm 1 PLS Algorithm [19]

Step 1: Estimate the weights and scores of latent variables using the following process. The inputs of this algorithm are the indicators and the initial value

{\tilde{w}}_{k_{j}}

. The following steps (1–4) will be repeated until the weights of the indicators converge.

1.: Outside approximation

$Y_{j n} = f_{j} \sum_{k_{j} = 1}^{K_{j}} {\tilde{w}}_{k_{j}} x_{k_{j} n}$
2.: Inner weight

$v_{j i} = {\begin{matrix} \pm s i g n (c o r (Y_{j}; Y_{i})); Y_{j} and Y_{i} connected \\ 0; otherwise \end{matrix}$
3.: Inside approximation

${\tilde{Y}}_{j} = \sum_{i = 1}^{Q_{j}} v_{j i} Y_{i}$
4.: Outer weight

$\begin{matrix} {\tilde{Y}}_{j n} = \sum_{k_{j} = 1}^{K_{j}} {\tilde{w}}_{k_{j}} x_{k_{j} n} + δ_{j n} & ; Mode B \\ x_{k_{j} n} = {\tilde{w}}_{k_{j}} {\tilde{Y}}_{j n} + ϵ_{k_{j} n} & ; Mode A \end{matrix}$

Step 2. Estimate path and loading coefficients using the ordinary least squares method.

Step 3. Estimate location parameter.

According to [15], there are three options for calculating the inner weight v_ji. The first scheme is centroid. This scheme only takes into account the sign of the direction of the correlation between adjacent latent variables. This scheme does not consider the path strength. The weight of the inner model v_ji is the sign correlation between Y_j and Y_i, i.e.,

v_{j i} = {\begin{matrix} \pm s i g n (c o r (Y_{j}, Y_{i})); if Y_{j} and Y_{i} are connected \\ 0; otherwise \end{matrix}

(6)

The second scheme is the factor scheme. This scheme not only considers directional signs, but also considers path strength in the structural model. The inner weight of the v_ji model is the correlation between Y_j and Y_i, i.e.,

v_{j i} = {\begin{matrix} c o r (Y_{j}, Y_{i}); if Y_{j} and Y_{i} connected \\ 0; otherwise \end{matrix} .

(7)

The third scheme is the path. A latent variable may be positioned as an independent or dependent variable depending on the cause-and-effect relationship. A latent variable is a dependent variable influenced by other latent variables, or as a predictor if it affects other latent variables. If the latent variable Y_i is a dependent variable from the latent variable Y_j, then the inner weight is the same as the correlation value between Yi and Y_j. On the other hand, if Y_i is the dependent variable of the latent variable Y_j, then the inner weight is the regression coefficient of Y_i in multiple regression to Y_j.

v_{j i} = {\begin{matrix} c o r (Y_{j}, Y_{i}); if Y_{i} is a predictor of Y_{j} \\ b_{j i}; if Y_{j} is a predictor of Y_{i} \\ 0; otherwise \end{matrix}

(8)

3. Proposed Method

3.1. Segmentation in SEM Using a Combination of Partial Least Squares and Modified Fuzzy Clustering

In this study, we developed a new method for determining a structural equation model based on data containing unobserved heterogeneity. Our method is a combination of the PLS method and modified fuzzy clustering. The new method is called PLSMFC. The PLS method is used to estimate the SEM parameters, and the modified fuzzy clustering method is used to find the segments of the object. We chose the value of the fuzzifier (m) to equal 2. The choice of this value is based on a previous study that indicates the value of m = 2 performs better in the fuzzy clustering group [34].

In the context of PLS, the structural model is often called the inner model. The structural relationship associated with the j* endogenous latent variable is denoted according to Equation (1). Furthermore, the reflective measurement model is represented by

x_{k_{j} n} = λ_{k_{j}} η_{j n} + ε_{k_{j} n} .

(9)

In vector and matrix notation, the reflective measurement model can be rewritten as

x_{k_{j}} = η_{j} λ_{k_{j}} + ε_{k_{j}}

(10)

The formative measurement model can be expressed by

η_{j n} = λ_{1} x_{1_{j} n} + λ_{2} x_{2_{j} n} + \dots + λ_{K_{j}} x_{K_{j} n} + δ_{j n}

(11)

In matrix and vector notation, the formative measurement model can be rewritten as

η_{j} = X_{j} Λ_{j} + δ_{j} .

(12)

The PLSMFC method finds the data segments after estimating the latent variable score. After the first stage and before the second stage of the PLS algorithm, the number of segments is chosen, and the initial weights are randomly determined. The number of segments can be determined by selecting 2, 3, 4, and so on. For a certain number of segments, the total weight of an object in all segments is equal to 1. The existence of this weighting causes a change in the method of estimating loading and path coefficient from the SEM.

The criteria used by the PLSMFC method to find the number of segments and the SEM in each segment is to minimize the sum of the weighted squared residual obtained by the outer and inner models. The use of residual distance as a substitute for Euclidean distance in conventional fuzzy clustering has been investigated by [40,41]. Mathematically, the objective function used in this process is

F = \sum_{j^{*} = 1}^{J^{*}} \sum_{c = 1}^{C} \sum_{n = 1}^{N} u_{c n}^{2} ζ_{j^{*} c n}^{2} + \sum_{j = 1}^{J_{R}} \sum_{k = 1}^{K_{j}} \sum_{c = 1}^{C} \sum_{n = 1}^{N} u_{c n}^{2} ε_{j c k n}^{2} + \sum_{j = 1}^{J_{F}} \sum_{c = 1}^{C} \sum_{n = 1}^{N} u_{c n}^{2} δ_{j c n}^{2}

(13)

where J* is the number of endogenous latent variables, J_R is the number of latent variables measured using the reflective model, and J_F is the number of latent variables measured using the formative measurement model.

u_{c n}

is the weight of object n in segment c,

\sum_{c = 1}^{C} u_{c n} = 1

for every n.

ζ_{n j^{*} c}^{2}

is the residual of the inner model related to the n-th observation, the- j* endogenous latent variable in the c-cluster.

ε_{n k j c}^{2}

is a residual of the outer model for the reflective measurement on the n-th observation in the k-th indicator, the j-th latent variable, and in segment c.

δ_{n k j c}^{2}

is the residual from the outer model for the formative measurement model at the n-th observation in segment c associated with the j-th latent variable. Equation (13) can be rewritten in vector and matrix notation as

F = \sum_{j^{*} = 1}^{J^{*}} \sum_{c = 1}^{C} ζ_{c j^{*}}^{T} U_{c} ζ_{c j^{*}} + \sum_{j = 1}^{J_{R}} \sum_{k = 1}^{K_{j}} \sum_{c = 1}^{C} ε_{cjk}^{T} U_{c} ε_{cjk} + \sum_{j = 1}^{J_{F}} \sum_{c = 1}^{C} δ_{cj}^{T} U_{c} δ_{cj}

(14)

where

U_{c} = [\begin{matrix} u_{c 1}^{2} & \dots & 0 \\ ⋮ & ⋱ & ⋮ \\ 0 & \dots & u_{c N}^{2} \end{matrix}]

.

Equation (14) can be rewritten as

F = \sum_{j^{*} = 1}^{J^{*}} \sum_{c = 1}^{c} [(y_{j^{*}} {- Y_{\to j^{*}} β_{j^{*} c})}^{T} U_{c} (y_{j^{*}} - Y_{\to j^{*}} β_{j^{*} c})] + \sum_{j = 1}^{J_{R}} \sum_{k = 1}^{K_{j}} \sum_{c = 1}^{c} [{(x_{j k} - y_{j} λ_{j k c})}^{T} U_{c} (x_{j k} - y_{j} λ_{j k c})] + \sum_{j = 1}^{J_{F}} \sum_{c = 1}^{c} [{(y_{j} - X_{j} Λ_{j c})}^{T} U_{c} (y_{j} - X_{j} Λ_{j c})]

(15)

where

y_{j^{*}}

is a vector of the j*-th endogenous latent variable score,

Y_{\to j^{*}}

is a matrix of the latent variables score related to the j*-th endogenous latent variable, and

β_{j^{*} c}

is a vector of the path coefficient associated with the j*-th latent variable in segment c.

x_{j k}

is a vector of the k-th indicator of the jth latent variable,

y_{j}

is a vector of the j-th latent variable scores, and

λ_{j k c}

is the loading coefficient (reflective measurement model) associated with the j-th latent variable on the indicator to k in segment c.

X_{j}

is a matrix of indicators that affect the j-th latent variable and

Λ_{j c}

is a vector of loading coefficients (formative measurement model) associated with the j-th latent variable in segment c.

3.2. Parameter Estimation of the Inner and Outer Model

The determination of the parameters estimator in the inner model (2) was conducted using the Lagrange multiplier method by minimizing the function in Equation (15) {\displaystyle f(x)} subjected to

\sum_{c = 1}^{C} u_{c n} = 1,

for every n. Equation (16) is the Lagrange function

\begin{matrix} F^{*} = \sum_{j^{*} = 1}^{J^{*}} \sum_{c = 1}^{C} [{(y_{j^{*}} - Y_{\to j^{*}} β_{j^{*} c})}^{T} U_{c} (y_{j^{*}} - Y_{\to j^{*}} β_{j^{*} c})] \\ + \sum_{j = 1}^{J_{R}} \sum_{k = 1}^{K_{j}} \sum_{c = 1}^{C} [{(x_{j k} - y_{j} λ_{j k c})}^{T} U_{c} (x_{j k} - y_{j} λ_{j k c})] \\ + \sum_{j = 1}^{J_{F}} \sum_{c = 1}^{C} [{(y_{j} - X_{j} Λ_{j c})}^{T} U_{c} (y_{j} - X_{j} Λ_{j c})] + l (\sum_{c = 1}^{C} u_{c n} - 1) \end{matrix}

(16)

where l is the Lagrange multiplier.

The parameters in the inner model are all in the first group; therefore, full attention is paid to the group and does not change the shape of the second, third, and fourth groups. The estimator of the parameters in the inner model is obtained through the first derivative of F* to

β_{j^{*} c}

, i.e.,

\frac{\partial F^{*}}{\partial {\hat{β}}_{j^{*} c}} = 0

; for specific j* and c.

2 Y_{\to j^{*}}^{T} U_{c} Y_{\to j^{*}} {\hat{β}}_{j^{*} c} - 2 Y_{\to j^{*}}^{T} U_{c} y_{j^{*}} = 0

{\hat{β}}_{j^{*} c} = {[Y_{\to j^{*}}^{T} U_{c} Y_{\to j^{*}}]}^{- 1} [Y_{\to j^{*}}^{T} U_{c} y_{j^{*}}]

(17)

Determination of the parameter estimators of the outer model for reflective measurement is carried out in the same way as before. The parameter estimators are obtained through the first derivative of F* to

λ

, namely

\frac{\partial F^{*}}{\partial \hat{λ_{j k c}}} = 0

; for specific j, k, and c. From Equation (16), we get

2 y_{j}^{T} U_{c} y_{j} {\hat{λ}}_{j k c} - 2 y_{j}^{T} U_{c} x_{j k} = 0

{\hat{λ}}_{j k c} = {[y_{j}^{T} U_{c} y_{j}]}^{- 1} [y_{j}^{T} U_{c} x_{j k}]

(18)

Estimators of the formative measurement model parameters were obtained using the previous method. The parameters in the outer model for the formative measurement are all in the third group. The parameter estimators are obtained through the first derivative of F* to

λ

, namely

\frac{\partial F^{*}}{\partial \hat{Λ_{j k c}}} = 0

; for specific j and c. From Equation (16), we get

2 X_{j}^{T} U_{c} X_{j} {\hat{Λ}}_{j c} - 2 X_{j}^{T} U_{c} y_{j} = 0

{\hat{Λ}}_{j c} = {[X_{j}^{T} U_{c} X_{j}]}^{- 1} [X_{j}^{T} U_{c} y_{j}]

(19)

Furthermore, the residuals of the inner and outer models can be calculated. The residual of the inner model associated with the-j* endogenous latent variable in the c-th segment is

{\hat{ζ}}_{j^{*} c} = y_{j} - Y_{\to j^{*}}^{T} {\hat{β}}_{j^{*} c}

(20)

The residual of the outer model associated with the reflective measurement of indicator x_kj in the c-th segment is

{\hat{ε}}_{k j c} = x_{k j} - y_{j} \hat{λ_{k j c}} .

(21)

The residual of the outer model associated with the formative measurement of the indicator of the j-th latent variable in the segment c is

{\hat{δ}}_{j c} = y_{j} - X_{j} \hat{Λ_{j c}}

(22)

3.3. Fuzzy Membership and PLSMFC Algorithm

In this section, the process of obtaining the fuzzy membership formula is explained. This is one of the important components that characterize the fuzzy clustering method. The formula for the fuzzy membership value was obtained using the Lagrange multiplier method. From Equation (13), we get

F^{*} = \sum_{j^{*} = 1}^{J^{*}} \sum_{c = 1}^{C} \sum_{n = 1}^{N} u_{c n}^{2} {\hat{ζ}}_{j^{*} c n}^{2} + \sum_{j = 1}^{J_{R}} \sum_{k = 1}^{K_{j}} \sum_{c = 1}^{C} \sum_{n = 1}^{N} u_{c n}^{2} {\hat{ε}}_{j c k n}^{2} + \sum_{j = 1}^{J_{F}} \sum_{c = 1}^{C} \sum_{n = 1}^{N} u_{c n}^{2} {\hat{δ}}_{j c n}^{2} + l (\sum_{c = 1}^{C} u_{c n} - 1)

\frac{\partial F^{*}}{\partial u_{c n}} = 0

, for particular c and n

\sum_{j^{*} = 1}^{J^{*}} 2 u_{c n} {\hat{ζ}}_{n j^{*} c}^{2} + \sum_{j = 1}^{J_{R}} \sum_{k = 1}^{K_{j}} 2 u_{c n} {\hat{ε}}_{n k j c}^{2} + \sum_{j = 1}^{J_{F}} 2 u_{c n} {\hat{δ}}_{n j c}^{2} + l = 0

2 u_{c n} [\sum_{j^{*} = 1}^{J^{*}} {\hat{ζ}}_{n j^{*} c}^{2} + \sum_{j = 1}^{J_{R}} \sum_{k = 1}^{K_{j}} {\hat{ε}}_{n k j c}^{2} + \sum_{j = 1}^{J_{F}} {\hat{δ}}_{n j c}^{2}] + l = 0

u_{c n} = - \frac{l}{2 [\sum_{j^{*} = 1}^{J^{*}} {\hat{ζ}}_{n j^{*} c}^{2} + \sum_{j = 1}^{J_{R}} \sum_{k = 1}^{K_{j}} {\hat{ε}}_{n k j c}^{2} + \sum_{j = 1}^{J_{F}} {\hat{δ}}_{n j c}^{2}]}

(23)

Due to

\sum_{c = 1}^{C} u_{c} = 1

for c = 1, 2, …, C and specific n, we obtain

l = \frac{1}{\sum_{c = 1}^{C} [- \frac{1}{2 [\sum_{j^{*} = 1}^{J^{*}} {\hat{ζ}}_{n^{*} j^{*} c}^{2} + \sum_{j = 1}^{J} \sum_{k = 1}^{K_{j}} {\hat{ε}}_{n^{*} j k c}^{2} + \sum_{j = 1}^{J} {\hat{δ}}_{n^{*} j c}^{2}]}]}

Finally, by substituting the value of l to Equation (23), we get the formula for updating the fuzzy membership on the n-th object as follows

u_{c n} = {[\sum_{h = 1}^{C} [- \frac{[\sum_{j^{*} = 1}^{J^{*}} {\hat{ζ}}_{n^{*} j^{*} c}^{2} + \sum_{j = 1}^{J_{R}} \sum_{k = 1}^{K_{j}} {\hat{ε}}_{n^{*} j k c}^{2} + \sum_{j = 1}^{J_{F}} {\hat{δ}}_{n^{*} j c}^{2}]}{[\sum_{j^{*} = 1}^{J^{*}} {\hat{ζ}}_{n^{*} h}^{2} + \sum_{j = 1}^{J} \sum_{k = 1}^{K_{j}} {\hat{ε}}_{n^{*} j k h}^{2} + \sum_{j = 1}^{J} {\hat{δ}}_{n^{*} j h}^{2}]}]]}^{- 1}

(24)

To characterize the heterogeneity of the SEM, minimization of the objective function as in Equation (13) is carried out using the following algorithm (Algorithm 2).

Algorithm 2 PLSMFC Algorithm (author’s own contribution)

Step 1: Estimate the weights and scores of latent variables using all data. The input of this algorithm is the indicators data and the initial value

{\tilde{w}}_{k_{j}}

. The following steps (1–4) are repeated until the weights of the indicators converge.

1.: Outside approximation

$Y_{j n} = f_{j} \sum_{k_{j} = 1}^{K_{j}} {\tilde{w}}_{k_{j}} x_{k_{j} n}$
2.: Inner weight

$v_{j i} = {\begin{matrix} \pm s i g n (c o r (Y_{j}, Y_{i})); Y_{j} and Y_{i} connected \\ 0; otherwise \end{matrix}$
3.: Inside approximation

${\tilde{Y}}_{j} = \sum_{i = 1}^{Q_{j}} v_{j i} Y_{i}$
4.: Outer weight

$\begin{matrix} {\tilde{Y}}_{j n} = \sum_{k_{j} = 1}^{K_{j}} {\tilde{w}}_{k_{j}} x_{k_{j} n} + δ_{j n} & ; Mode B \\ x_{k_{j} n} = {\tilde{w}}_{k_{j}} {\tilde{Y}}_{j n} + ϵ_{k_{j} n} & ; Mode A \end{matrix}$

Step 2. Set the number of segments C, the initial fuzzy membership value of the n-th object in the segment c (u_cn), and

Δ

.
Step 3. Estimate path and loading coefficients using Equations (17)–(19).
Step 4. Calculate the residual of the inner model and outer model in the c-th segment using Equations (20)–(22).
Step 5. Update the fuzzy membership value for the n-th observation in the c-segment using Equation (24).
Step 6. Calculate the objective functions (F) using Equation (13).
Step 7. If

| F^{(s + 1)} - F^{(s)} | < Δ

, go to step 8. Otherwise, go back to step 3.Step 8. Repeat stages 1 through 7 for a different number of segments.

In this paper, the segment validity measures used are fuzziness performance index (FPI) and normalized classification entropy (NCE) [34,42]. The FPI formula is as follows:

F P I = 1 - \frac{C (P C - 1)}{(C - 1)}

(25)

where PC is the Partition Coefficient defined by

P C = \frac{1}{N} \sum_{n = 1}^{N} \sum_{c = 1}^{C} u_{c n}^{2}

and C is the number of segments. The NCE formula is as follows:

N C E = \frac{P E}{l o g C}

(26)

where PE is partition entropy defined by

P E = - \frac{1}{N} \sum_{n = 1}^{N} \sum_{c = 1}^{C} u_{c n} l o g u_{c n}

The smaller the FPI and NCE value, the better the cluster formed in separating objects from one another.

4. Results and Discussions

A simulation study and applications on real data were conducted to evaluate the performance of the PLSMFC method. Specifically, this simulation aimed to determine how effective this method is in reallocating segment membership and reestimating SEM parameters in different population sizes. The simulation design considers various factors, including the number of segments, model specifications, distribution of latent variables, residual variance of endogenous latent variables, variance of residual indicator variables, and population size.

4.1. Design of Simulation and Data Generating Process

The number of segments used in the simulation is two-level, 2 and 3. The SEM model specifications consist of Model 1 and Model 2. Model 1 refers to the SEM, which only consists of a reflective measurement model, whereas Model 2 refers to the SEM, which contains reflective and formative measurement models. The SEM model used in this simulation is represented in Figure 1 and Figure 2. The distribution of latent variable scores and indicators consists of two-level, i.e., N(0,1), which is a symmetric distribution, and beta B(4,9), which is an asymmetric distribution. Furthermore, the residuals for endogenous latent variables follow normal distribution (0,

σ^{2}

) where the variances are set at 5%, 10%, and 20%. The indicators in each measurement model use the same amount, each consisting of three indicator variables. The loading coefficients for the number of segments 2 and 3 are set the same in both Models 1 and 2. These coefficients can be seen in Table 1. Furthermore, the residuals for the indicator variables are normally distributed (0,

σ^{2}

) where the variance is set at 5%, 10%, and 20%. The population size is set at three levels, namely, 50 (small), 200 (medium), and 1000 (large), where each segment has a balanced size. Overall, this study involved 2 × 2 × 2 × 3 × 3 × 3 = 216 combinations. Each combination was replicated 100 times using different initial weights to avoid convergence at the local minimum point.

Table 1 shows the SEM parameters in each segment. If the number of segments is two, then the SEM parameters in each segment are as in columns 1 and 2, but if the number of the segments is three, then the SEM parameters in each segment are as in columns 1, 2, and 3. Simulation data were obtained by following a two-step procedure. As an illustration, suppose we generate data from two segments using Model 1. The first step is to produce scores of the latent variable. Exogenous latent variable scores

η_{1}

and

η_{2}

were generated from two distributions, namely the standard normal distribution N(0,1) and the asymmetric distribution, in which beta B(4,9) distribution was chosen. The latent variable

η_{3}

scores were obtained by following the specifications of Model 1 on the inner model

η_{3} = β_{1} η_{1} + β_{2} η_{2} + ζ_{3}

. After

η_{1}

,

η_{2}

, dan

ζ_{3}

are generated, the scores

η_{3}

can be obtained. The second step is to generate the data for the indicators

x_{31,} x_{32,}

and

x_{33}

by first generating the data

ε_{31,} ε_{32,}

and

ε_{33}

from the distribution N(0,

σ^{2}

). Using the specification of the outer model below, the values

x_{31,} x_{32,}

and

x_{33}

could be obtained.

x_{31} = λ_{31} η_{3} + ε_{31} x_{32} = λ_{32} η_{3} + ε_{32} x_{33} = λ_{33} η_{3} + ε_{33}

The data in Model 2 were obtained by first generating the data

x_{11,} x_{12,}

x_{13,} x_{21,} x_{22,}

and

x_{23}

from the standard normal distribution N(0,1) or beta distribution B(4,9). Then,

δ_{1}

and

δ_{2}

data were generated from the distribution N(0,

σ^{2}

). The scores of the exogenous latent variable

η_{1}

and

η_{2}

were generated by following the specification of the outer model

η_{1} = λ_{11} x_{11} + λ_{12} x_{12} + λ_{13} x_{13} + δ_{1} η_{2} = λ_{21} x_{21} + λ_{22} x_{22} + λ_{23} x_{23} + δ_{2}

Furthermore, the data for the indicator

x_{31,} x_{32,}

and

x_{33}

were generated by the data

ε_{31,} ε_{32,}

and

ε_{33}

first from the distribution N(0,

σ^{2}

) and by using the specification of the outer model, i.e.,

x_{31} = λ_{31} η_{3} + ε_{31} x_{32} = λ_{32} η_{3} + ε_{32} x_{33} = λ_{33} η_{3} + ε_{33}

4.2. Simulation Results

This simulation study examines the quality of the segmentation solution generated by the PLSMFC method in terms of how many observations are correctly reallocated by the PLSMFC method. This study also analyzes how well the parameters of the SEM in each segment are estimated by the PLSMFC method. The reallocation process is carried out based on the largest weight obtained by each observation. If an observation is initially in segment one and then reallocated to segment one by the PLSMFC method, this means that the observation has been correctly reallocated by the method. The hit ratio statistic is used to measure the proportion of data generated in a particular segment, and it is assigned again in that segment by the PLSMFC method.

Figure 3 below is the hit ratio of various factors used in this study in both Models 1 and 2. The figure explains that the trend of hit ratio values for Models 1 and 2 have the same pattern. In the number of segments, the hit ratio value in the number of segments 2 tends to be higher than the number of segments 3. In the population size, the larger the population size, the greater the hit ratio value. Furthermore, in the variance of the endogenous latent residual variable, the greater the variance value, the smaller the hit ratio. Likewise, for the variance of the indicator residual, the greater the variance, the smaller the hit ratio value. The performance of the latent variable distribution factor also has the same trend. The hit ratio value obtained from the latent distribution variable N(0,1) generally tends to be larger than the distribution B(4,9). In addition, it can also be shown that at all levels of each factor, Model 2 tends to obtain a higher hit ratio value than Model 1. However, the performance of the PLSMFC method is determined by the interaction of the factors used in this study.

Figure 4 is a scatter plot of the hit ratio from the combination of level–level factors of Model 1. The highest hit ratio in Model 1 is achieved by a combination of the number of segments is 2, population size of 200, residual variance of endogenous variables of 5%, residual variance of indicator variables of 5%, and the distribution of latent variables N(0,1). The combination of these levels results in a hit ratio of 98.17%. The lowest hit ratio in Model 1 is achieved by a combination of the number of segments is 2, population size of 200, residual variance of endogenous variables of 5%, residual variance of the indicator variables of 20%, and the distribution of the latent variables N(0,1). The combination of these levels produces a hit ratio of 57.73%. Figure 5 explains the hit ratio from the combination of level–level factors of Model 2. The highest hit ratio in Model 2 is achieved by a combination of the number of segments is 3, population size of 1000, residual variance of endogenous variables of 5%, residual variance of indicator variables of 5%, and the distribution of indicators B(4,9). The combination of these levels produces a hit ratio of 99.88%. The lowest hit ratio in Model 2 was achieved by a combination of the number of segments is 3, population size of 50, residual variance of endogenous variables of 20%, residual variance of the indicator variables of 20%, and the distribution of the latent variable B(4,9). The combination of these levels produces a hit ratio value of 64.42%. We conclude that our proposed method shows adequate performance.

Table 2 describes the mean parameter estimates of Model 1 obtained under the conditions that the distribution of latent variables is N(0,1), residual variance of the endogenous latent variables is 5%, residual variance of indicators is 5%, and the number of segments is 2. Table 3 shows the mean estimated value of Model 2 obtained under the conditions that the distribution of latent variables is N(0,1), residual variance of the endogenous latent variables is 5%, residual variance of indicators is 5%, and the number of segments is 2. The average of the estimated parameter values of Models 1 and 2 was obtained after 100 repetitions. The tables show that the PLSMFC method can generally predict the parameters of Models 1 and 2 well. Model 1 has specifications that all measurement models of the construct are reflective, whereas Model 2 has a mixed measurement model specification between reflective and formative. This shows that the method we have developed is able to work satisfactorily for both specifications of the SEM model. The standard error of the estimated parameter values decreases as the sample size increases. In addition, it can be seen that the PLSMFC method works well even in small population size.

4.3. Application on Real Data

This section explains the use of the PLSMFC method to find heterogeneity in job performance data. The first step in studying the PLSMFC application is to estimate the latent variable scores. The inner weight scheme we use is the centroid, as in Equation (6). However, other schemes can also be used. The results obtained are compared with the REBUS-PLS method. REBUS-PLS was chosen as a comparison because this method has the same perspective as the PLSMFC method, where unboserved heterogeneity is caused by the inner and outer models. This is different from the FIMIX PLS and PLS IRRS methods, where the unobserved heterogeneity is entirely caused by the inner model [25,27]. The data to support the application were obtained from the R documentation, whose data could be obtained by writing the R code, read.csv (‘https://articledatas3.s3.eu-central-1.amazonaws.com/StructuralEquationModelingData.csv’; accessed 15 July 2022). In this study, Job_Performance was estimated based on three indicators: Client_Sat, which is the satisfaction value of the main client with a range of 1 to 100; Super_Sat, which ranks job performance according to superiors with a value range from 1 to 100; and Proj_Compl, which is the percentage of completed projects. The hypothesis of this study states that work performance is strongly influenced by three other latent variables, namely employee social skills, intellectual skills, and motivation. Moreover, each of these variables cannot be measured directly; therefore, it is necessary to determine the indicators. The social skill construct is based on two measurable variables: Psych_Test1, psychological test scores with a range of 1–100, and Psych_Test2, which also has a score range of 1–100. The intellectual skills is based on two measurable variables: Years_Edu is the number of years of higher education, and IQ is the score on an IQ test. The motivation construct is based on two measurable variables, namely Hrs_Train, which is the number of hours spent on training, and Hrs_Work, which is the mean of hours in a working week. All constructs were modeled using reflective measurements. Figure 6 below is a path diagram that illustrates the relationship among latent variables.

Similarly to other fuzzy clustering classes, by using PLSMFC, the number of segments in this study was first determined, and an evaluation to determine the appropriate segment number was conducted based on the FPI and NCE indices. The formulas for FPI and NCE are shown in Equations (21) and (22), respectively. Figure 7 shows the values for FPI and NCE indices obtained after the PLSMFC method was used on various segments. The optimum number of segments is selected when the values for FPI and NCE indices are the lowest. By applying the PLSMFC algorithm to job performance data, we identified the optimal number of segments is 2, with the FPI value equal to 1.9338 and the NCE value 0.9502, as shown in Figure 7. Therefore, heterogeneity in the SEM model of job performance data is obtained on the basis of the number of segments being 2.

Table 4 shows the estimated values of the parameters in each segment. To evaluate the significance of the parameter estimated values, we have applied bootstrap with resampling of 500 times. The standard error results and the critical ratio value also appear in Table 4. All path coefficients from the table are significant at the 5% significance level. Work motivation appears to positively influence work performance in segments 1 and 2. The effect of motivation on work performance is much more significant than the influence of social and intellectual skills. However, the influence of motivation in each segment has a different magnitude. The motivational factor contributes more to work performance in segment 1 than in segment 2. The path coefficient of motivation in segment 1 reaches 0.8179, whereas in segment 2, it is only 0.7917. Furthermore, the coefficient of determination in segment 1 is 92.07%, which indicates that 92.07% of the diversity in the construct of work performance can be explained by the constructs of social skills, skills, and work motivation. The coefficient of determination in segment 2 is 92.02%, which indicates that 92.02% of the diversity in the construct of work performance can be explained by social skills, intellectual skills, and work motivation, and 7.98% is explained by other constructs not considered in this study.

The loading coefficients in this job performance study are almost all significant in each indicator in each segment, except for the loading coefficient of the client satisfaction indicator in segment 1, which shows no significance. In segment 1, client satisfaction is not one of the suitable indicators to measure work performance. The loading coefficient for client satisfaction in segment 1 is only 0.3760 with a standard error of 0.2693, whereas the loading coefficient for client satisfaction in segment 2 is 0.9142 with a standard error of 0.2693. This phenomenon shows that heterogeneity in the data is not only shown by the structural model, but may also come from different measurement models. This result differs from the others obtained by researchers who assumed that the unobserved heterogeneity stems only from the structural model [21,22,25].

The REBUS-PLS method was also used to find the segments of the data above. By using the same number of segments as the PLSMFC method (i.e., 2), unobserved heterogeneity of Job Performance data was identified. Table 5 explains the estimated values of the parameters in each segment. To evaluate the significance of the parameter estimated values, we applied bootstrap with resampling of 500 times. From Table 5, it can be seen that the loading and path coefficients in segments 1 and 2 are numerically different. From the table, it is known that all loadings on indicators in the measurement model are significant at 5% significance level. Both in segment 1 and segment 2 have the same conditions. The path coefficient on the inner model in segment 1 and 2 also shows a significant effect of latent oxygen variables on job performance at a significant level of 5%. The coefficient of the determination of job performance in segment 1 is 91.34%, which means that the variations that appear in the latent endogenous variable of job performance can be explained by social, intellectual, and motivational variables. In segment 2 the coefficient of the determination of the latent variable endogenous job performance is 91.56%. This demonstrates that 91.56% of the variability in job performance can be explained by the exogenous latent variables used in this study, while the remaining 8.4% is explained by other latent variables that were not considered in this study.

Table 4 and Table 5 are a summary of the results obtained after the PLSMFC and REBUS-PLS methods were applied to job performance data. It can be seen that the PLSMF method is more sensitive in detecting the significance of the model parameters. It can also be observed that in segment 1, the client satisfaction indicator is not a good measure of the latent variable endogenous job performance, but using the REBUS-PLS method shows the opposite result. In the evaluation of the inner model, the performance of the PLSMFC method also produces better results than the REBUS-PLS method. The coefficient of the determination produced by the PLSMFC method is higher than the REBUS-PLS method.

4.4. Future Research

This study introduces PLSMFC, a new segmentation method for PLS SEM, which makes it possible to reveal unobserved heterogeneity within the data. The method finds the segments based on the residual values obtained from the measurement and structural models. The sum of the squares of the weighted residuals is an objective function that must be minimized. This makes sense because the SEM models in each segment are obtained by minimizing the residual using a weighted least square.

The PLSMFC method offers flexibility. If using these residuals does not compute well, then the objective function can be reduced by removing the residuals from the measurement model, so that the objective function becomes a function of the residuals of the structural model. Reducing the residual component in the objective function will yield in a change in the fuzzy membership formula. This perspective is the same as FIMIX PLS and PLS IRRS, where the heterogeneity of the data is considered to be influenced only by the structural model [16,20]. The PLSMFC method does not depend on distributional assumptions and can reveal heterogeneity in the reflexive and formative measurement model.

Despite all the advantages of the PLSMFC method, it has several limitations that warrant further investigation. As with other class latent methods such as REBUS-PLS, FIMIX PLS, PLS GAS, and PLS IRRS, the exact number of segments for a data set is initially unknown. The process of finding the right segment is known after running the PLSMFC method with different numbers of segments. This makes this method inefficient, because we have to run the algorithm many times. Next, the simulation design in this paper does not involve a specification of the SEM model where the measurement models are all formative. If the endogenous latent variable is measured formatively, the scenario for generating the data becomes more complicated. The analysis of the data generation with the previous specification of the measurement model requires further research. Furthermore, the segmentation process in the PLSMFC method uses modified fuzzy clustering where the value of the fuzzy parameter is set equal to 2. A study of the effect of a fuzzy parameters value of more than two on the quality of segmentation is worth investigation. Moreover, the fuzzy clustering method continues to develop and provides an opportunity for researchers to combine the PLS method with the latest fuzzy clustering methods as in [30,31].

5. Conclusions

In this study, a new method for estimating SEM parameters based on heterogeneous data was proposed. This method is a combination of PLS and modified fuzzy clustering. We used the sum weighted residual generated by the outer and inner models as a substitute for the Euclidean distance in classical fuzzy clustering.

To evaluate the algorithm, we simulated 432 scenarios from 5 factors and used 100 replications per scenario. The 216 scenarios came from Model 1, a model where all measurements were reflective, and 216 from Model 2, a model where latent variables are a mix of reflective and formative measurements. We generated simulation data with various population sizes to represent small (50), medium (200), and large (1000) population sizes. The simulation results show that each scenario provided varying levels of accuracy. In general, it can be concluded that the PLSMFC method demonstrates good accuracy even for small population sizes. In addition, the greater the residual variance of the endogenous constructs and the indicators, the smaller the obtained hit ratio value. In addition, our method generates a re-estimation of model parameters with results proportional to the hit ratio achievement. The greater the hit ratio, the more accurately the model parameters are re-estimated. This condition occurs in Model 1, a reflexive measurement model, and in Model 2, a model with a mixture of reflective and formative measurements.

We applied the PLSMFC method to job performance data to examine the relationship between job performance, social skills, intellectual skills, and motivation, in which all constructs were reflectively measured. The results of the application study show that the correct number of segments is two, where all path coefficients are significant at a 5% significance level in both segments 1 and 2. However, there is a slight difference in the measurement model where all the loading coefficients in segment 2 are significant but the loading coefficient of client satisfaction is insignificant in segment 1.

We recommend the application of PLSMFC when data contain unobserved heterogeneity because of its ability to allocate the object into the correct segment. At the same time, the method can appropriately estimate parameters of the structural equation model for each segment.

Author Contributions

M.A.M. and B.W.O. conceptualized, designed the research, and drafted the paper; A.S. collected and analyzed the data. All the authors have critically read and revised the draft and approved the final paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Ministry of Education, Culture, Research, and Technology of the Republic of Indonesia, which gave the scholarship to Beasiswa Pendidikan Pascasarjana Dalam Negeri No. 2748/A3/KP.04.00/2021.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors thank the Ministry of Education, Culture, Research, and Technology of the Republic of Indonesia for its support. The authors also thank the editor and the referees for their helpful comments.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

Abbreviations	Full Form
SEM	Structural Equation Modeling
PLS	Partial Least Squares
PLS GAS	GAS Partial Least Squares Genetic Algorithm Segmentation
REBUS-PLS	Response Based Units Segmentation Partial Least Squares
PLSMFC	Partial Least Squares and Modified Fuzzy Clustering
FIMIX PLS	Finite Mixture Partial Least Squares
FPI	Fuzziness Performance Index
NCE	Normalized Classification Entropy

References

Vinzi, V.E.; Trinchera, L.; Amato, S. PLS Path Modeling: From Foundations to Recent Developments and Open Issues for Model Assessment and Improvement. In Handbook of Partial Least Squares: Concepts, Methods and Application; Vinzi, V.E., Chin, W.W., Henseler, J., Wang, H., Eds.; Springer: New York, NY, USA, 2010; pp. 47–82. [Google Scholar] [CrossRef]
Joreskog, K.G.; Sorbom, D.A.G. Recent Developments in Structural Equation Modeling. J. Mark. Res. 1982, 19, 404–416. [Google Scholar] [CrossRef]
Steenkamp, J.E.M.; Baumgartner, H. On the use of structural equation models for marketing modeling. Int. J. Res. Mark. 2000, 17, 195–202. [Google Scholar] [CrossRef]
Aktepe, A.; Ersöz, S.; Toklu, B. Computers & Industrial Engineering Customer satisfaction and loyalty analysis with classification algorithms and Structural Equation Modeling. Comput. Ind. Eng. 2015, 86, 95–106. [Google Scholar]
Chakraborty, S.; Sengupta, K. Structural equation modelling of determinants of customer satisfaction of mobile network providers: Case of Kolkata. IIMB Manag. Rev. 2014, 26, 234–248. [Google Scholar] [CrossRef] [Green Version]
Seddig, D.; Lomazzi, V. Using cultural and structural indicators to explain measurement noninvariance in gender role attitudes with multilevel structural equation modeling. Soc. Sci. Res. 2019, 84, 102328. [Google Scholar] [CrossRef]
Bai Gokarna, V.; Mendon, S.; Thonse Hawaldar, I.; Spulbar, C.; Birau, R.; Nayak, S.; Manohar, M. Exploring the antecedents of institutional effectiveness: A case study of higher education universities in India. Econ. Res. Istraz. 2022, 35, 1162–1182. [Google Scholar] [CrossRef]
Hair, J.; Alamer, A. Partial Least Squares Structural Equation Modeling (PLS-SEM) in second language and education research: Guidelines using an applied example. Res. Methods Appl. Linguist. 2022, 1, 100027. [Google Scholar] [CrossRef]
Burić, I.; Kim, L.E. Teacher self-efficacy, instructional quality, and student motivational beliefs: An analysis using multilevel structural equation modeling. Learn. Instr. 2020, 66, 101302. [Google Scholar] [CrossRef]
Fauzi, A.; Hidayat, N.R.; Otok, B.W.; Waluyo, M. Clustering partial least square in lecturer achievement index (LAI) based on student perception of UPN ‘Veteran’ Surabaya. Int. J. Mech. Eng. Technol. 2018, 9, 273–284. [Google Scholar]
Otok, B.W.; Sustrami, D.; Hastuti, P.; Purnami, S.W.; Suharsono, A. Structural equation modeling the environment, psychology, social relationships against physical health in determination quality of elderly community surabaya. Int. J. Civ. Eng. Technol. 2018, 9, 926–938. [Google Scholar]
Kang, W.; Shao, B. The impact of voice assistants’ intelligent attributes on consumer well-being: Findings from PLS-SEM and fsQCA. J. Retail. Consum. Serv. 2023, 70, 103130. [Google Scholar] [CrossRef]
Chuah, S.H.W.; Tseng, M.L.; Wu, K.J.; Cheng, C.F. Factors influencing the adoption of sharing economy in B2B context in China: Findings from PLS-SEM and fsQCA. Res. Conserv. Recycl. 2021, 175, 105892. [Google Scholar] [CrossRef]
Hidayat, R.N.; Poernomo, E.; Waluyo, M.; Otok, B.W. The model of risk of travel ticket purchasing decisions on marketing communication mix in online site using structural equation modeling. Int. J. Civ. Eng. Technol. 2018, 9, 847–856. [Google Scholar]
Hu, Z.; Ding, S.; Li, S.; Chen, L.; Yang, S. Adoption intention of fintech services for bank users: An empirical examination with an extended technology acceptance model. Symmetry 2019, 11, 340. [Google Scholar] [CrossRef] [Green Version]
Mican, D.; Sitar-Tăut, D.A.; Mihuţ, I.S. User behavior on online social networks: Relationships among social activities and satisfaction. Symmetry 2020, 12, 1656. [Google Scholar] [CrossRef]
Nicolas, C.; Kim, J.; Chi, S. Quantifying the dynamic effects of smart city development enablers using structural equation modeling. Sustain. Cities Soc. 2020, 53, 101916. [Google Scholar] [CrossRef]
Bollen, K.A. Structural Equations with Latent Variables; Wiley: New York, NY, USA, 2014. [Google Scholar]
Lohmöller, J.B. Latent Variable Path Modeling with Partial Least Squares; Physica-Verlag HD: Berlin, Germany, 2013. [Google Scholar]
Hair, J.F.; Hult, G.T.M.; Ringle, C.M.; Sarstedt, M. A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM); SAGE Publications: Thousand Oaks, CA, USA, 2013. [Google Scholar]
Hahn, C.; Johnson, M.D.; Herrmann, A.; Huber, F. Capturing Customer Heterogeneity using a Finite Mixture PLS Approach. Schmalenbach Bus. Rev. 2002, 54, 243–269. [Google Scholar] [CrossRef] [Green Version]
Hair, J.F.; Sarsted, M.; Matthews, L.M.; Ringle, C.M. Identifying and treating unobserved heterogeneity with FIMIX-PLS: Part I–method. Eur. Bus. Rev. 2016, 28, 63–76. [Google Scholar] [CrossRef]
Jedidi, K.; Jagpal, H.S.; Desarbo, W.S. Finite-Mixture Structural Equation Models for Response-Based Segmentation and Unobserved Heterogeneity. Mark. Sci. 1997, 16, 39–59. [Google Scholar] [CrossRef]
Ringle, C.M.; Sarstedt, M. Treating unobserved heterogeneity in PLS path modelling: A comparison of FIMIX-PLS with different data. J. Appl. Stat. 2010, 37, 1299–1318. [Google Scholar] [CrossRef]
Schlittgen, R.; Ringle, C.M.; Sarstedt, M.; Becker, J. Segmentation of PLS path models by iterative reweighted regressions. J. Bus. Res. 2016, 69, 4583–4592. [Google Scholar] [CrossRef] [Green Version]
Vinzi, V.E.; Lauro, C.N.; Amato, S. PLS Typological Regression: Algorithmic, Classification and Validation Issues. In New Developments in Classification and Data Analysis: Proceedings of the Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society; University of Bologna: Bologna, Italy, 2003; pp. 133–140. [Google Scholar]
Vinzi, V.E.; Trinchera, L.; Squillacciotti, S.; Tenenhaus, M. REBUS-PLS: A response-based procedure for detecting unit segments in PLS path modelling. Appl. Stoch. Model. Bus. Ind. 2008, 24, 439–458. [Google Scholar] [CrossRef]
Ansari, A.; Jedidi, K.; Jagpal, S. A Hierarchical Bayesian Treating Equation Methodology for Structural Models in Heterogeneity. Mark. Sci. 2014, 19, 328–347. [Google Scholar] [CrossRef]
Ringle, C.M.; Sarstedt, M.; Schlittgen, R. Genetic algorithm segmentation in partial least squares structural equation modeling. OR Spectr. 2014, 36, 251–276. [Google Scholar] [CrossRef]
Bhagat, A.; Kshirsagar, N.; Khodke, P.; Dongre, K.; Ali, S. Penalty Parameter Selection for Hierarchical Data Stream Clustering. Procedia Comput. Sci. 2016, 79, 24–31. [Google Scholar] [CrossRef] [Green Version]
Berget, I.; Mevik, B.H.; Næs, T. New modifications and applications of fuzzy C-means methodology. Comput. Stat. Data Anal. 2008, 52, 2403–2418. [Google Scholar] [CrossRef]
Krishnapuram, R.; Keller, J.M. A Possibilistic Approach to Clustering. IEEE Trans. Fuzzy Syst. 1993, 1, 98–110. [Google Scholar] [CrossRef]
Nayak, J.; Behera, H.S.; Naik, B. Fuzzy C-Means (FCM) Clustering Algorithm: A Decade Review from 2000 to 2014. In Smart Innovation, Systems and Technologies; Springer: New Delhi, India, 2015; Volume 32. [Google Scholar] [CrossRef]
Hwang, H.; DeSarbo, W.S.; Takane, Y. Fuzzy Clusterwise Generalized Structured Component Analysis. Psychometrika 2007, 72, 181–198. [Google Scholar] [CrossRef] [Green Version]
Tang, Y.; Pan, Z.; Pedrycz, W.; Ren, F.; Song, X. Viewpoint-Based Kernel Fuzzy Clustering with Weight Information Granules. IEEE Trans. Emerg. Top. Comput. Intell. 2022. [Google Scholar] [CrossRef]
Wu, C.; Zhang, X. A self-learning iterative weighted possibilistic fuzzy c-means clustering via adaptive fusion. Expert Syst. Appl. 2022, 209, 118280. [Google Scholar] [CrossRef]
Zhang, Y.; Bai, X.; Fan, R.; Wang, Z. Deviation-sparse fuzzy C-means with neighbor information constraint. IEEE Trans. Fuzzy Syst. 2019, 27, 185–199. [Google Scholar] [CrossRef]
Wei, H.; Chen, L.; Chen, C.L.P.; Duan, J.; Han, R.; Guo, L. Fuzzy clustering for multiview data by combining latent information. Appl. Soft Comput. 2022, 126, 109140. [Google Scholar] [CrossRef]
Chin, W.W. The Partial Least Squares Approach to Structural Equation Modeling. In Modern Methods for Business Research; Lawrence Erlbaum Associates Publishers: Mahwah, NJ, USA, 1998; pp. 295–336. [Google Scholar]
Wedel, M.; Steenkamp, J.B.E.M. A fuzzy clusterwise regression approach to benefit segmentation. Int. J. Res. Mark. 1989, 6, 241–258. [Google Scholar] [CrossRef]
Wedel, M. Clusterwise Regression and Market Segmentation Developments and Applications; Toxicology and Nutrition Institute: Zeist, Netherlands, 1986. [Google Scholar]
Ryoo, J.H.; Park, S.; Kim, S.; Ryoo, H.S. Efficiency of cluster validity indexes in fuzzy clusterwise generalized structured component analysis. Symmetry 2020, 12, 1514. [Google Scholar] [CrossRef]

Figure 1. Path diagram of Model 1 (source: author’s own contribution).

Figure 2. Path Diagram of Model 2 (Source: author’s own contribution).

Figure 3. Hit ratio of Models 1 and 2 in all factors shown for the (a) number of segments, (b) population size, (c) residual variance of endogenous variables, (d) residual variance of indicator, and (e) distribution (source: author’s own contribution).

Figure 4. Hit ratio in the combination of factor levels of Model 1 (source: author’s own contribution).

Figure 5. Hit ratio in the combination of factor levels of Model 2 (source: author’s own contribution).

Figure 6. Path diagram describing the relationship between latent variables (source: author’s own contribution).

Figure 7. Fuzzy performance index (FPI) and normalized classification entropy (NCE) were calculated to identify the optimum segment for the job performance data (source: author’s own contribution).

Table 1. Parameters in each segment.

Parameter	Segment 1	Segment 2	Segment 3
$λ_{11}$	0.50	0.65	0.80
$λ_{12}$	0.55	0.70	0.85
$λ_{13}$	0.60	0.75	0.90
$λ_{21}$	0.50	0.65	0.80
$λ_{22}$	0.55	0.70	0.85
$λ_{23}$	0.60	0.75	0.90
$λ_{31}$	0.50	0.65	0.80
$λ_{32}$	0.55	0.70	0.85
$λ_{33}$	0.60	0.75	0.90
$β_{1}$	0.50	0.70	0.90
$β_{2}$	0.50	0.70	0.90

Source: author’s own contribution.

Table 2. The mean parameter estimates of Model 1 with distribution of latent variables N(0,1), residual variance of endogenous latent variables of 5%, residual variance of indicators of 5%, and number of segments 2.

	Parameter	Mean Parameter Estimates
	Parameter	N = 50	N = 200	N = 1000
Segment 1
$λ_{111}$	0.50	0.5009	0.5018	0.5009
$λ_{121}$	0.55	0.5497	0.5514	0.5513
$λ_{131}$	0.60	0.6003	0.6019	0.6010
$λ_{211}$	0.50	0.5004	0.5011	0.5008
$λ_{221}$	0.55	0.5516	0.5514	0.5512
$λ_{231}$	0.60	0.6026	0.6014	0.6010
$λ_{311}$	0.50	0.5003	0.5014	0.5024
$λ_{321}$	0.55	0.5522	0.5507	0.5511
$λ_{331}$	0.60	0.6029	0.6019	0.6011
$β_{11}$	0.50	0.5004	0.5014	0.5007
$β_{21}$	0.50	0.502	0.5002	0.5006
Segment 2
$λ_{112}$	0.65	0.6492	0.6497	0.6488
$λ_{122}$	0.70	0.6999	0.6987	0.6988
$λ_{132}$	0.75	0.7497	0.7488	0.7485
$λ_{212}$	0.65	0.6474	0.6491	0.6484
$λ_{222}$	0.70	0.6973	0.6979	0.6990
$λ_{232}$	0.75	0.7482	0.7478	0.7490
$λ_{312}$	0.65	0.6503	0.6495	0.6496
$λ_{322}$	0.70	0.6987	0.6997	0.6995
$λ_{332}$	0.75	0.7500	0.7495	0.7493
$β_{12}$	0.70	0.6991	0.6990	0.6989
$β_{22}$	0.70	0.6968	0.6991	0.6990
Hit Ratio		97.94%	98.08%	98.03%

Source: author’s own contribution.

Table 3. The mean parameter estimates of Model 2 with distribution of latent variables N(0,1), residual variance of endogenous latent variables of 5%, residual variance of indicators of 5%, and number of segments 2.

	Parameter	Mean Parameter Estimates
	Parameter	N = 50	N = 200	N = 1000
Segment 1
$λ_{111}$	0.50	0.4995	0.4998	0.5006
$λ_{121}$	0.55	0.5506	0.5507	0.5506
$λ_{131}$	0.60	0.5985	0.6005	0.6005
$λ_{211}$	0.50	0.5006	0.5008	0.5002
$λ_{221}$	0.55	0.5509	0.5502	0.5506
$λ_{231}$	0.60	0.6006	0.5995	0.6002
$λ_{311}$	0.50	0.5006	0.5014	0.500
$λ_{321}$	0.55	0.5509	0.5501	0.5510
$λ_{331}$	0.60	0.6007	0.6007	0.6008
$β_{11}$	0.50	0.5007	0.5005	0.5004
$β_{21}$	0.50	0.5004	0.5001	0.5010
Segment 2
$λ_{112}$	0.65	0.6500	0.6496	0.6493
$λ_{122}$	0.70	0.7007	0.7001	0.6997
$λ_{132}$	0.75	0.7492	0.7500	0.7498
$λ_{212}$	0.65	0.6516	0.6501	0.6494
$λ_{222}$	0.70	0.6987	0.7002	0.6998
$λ_{232}$	0.75	0.7498	0.7497	0.7493
$λ_{312}$	0.65	0.6499	0.6499	0.6500
$λ_{322}$	0.70	0.6986	0.7004	0.6999
$λ_{332}$	0.75	0.7509	0.7496	0.7502
$β_{12}$	0.70	0.7012	0.6997	0.6998
$β_{22}$	0.70	0.700	0.6999	0.6998
Hit Ratio		97.92%	98.09%	98.19%

Source: author’s own contribution.

Table 4. The bootstrap standard error and critical ratio for outer loading and path coefficient using PLSMFC algorithm.

Segment	Loading/Path Coefficient	Bootstrap Standard Error	Bootstrap Critical Ratio	Significance
Segment 1
Outer Loading
Social $\to$ Psych_Test1	0.8640	0.0464	18.6385	Yes
Social $\to$ Psych_Test2	0.9547	0.0538	17.7607	Yes
Intellect $\to$ Years_Edu	0.8245	0.0086	95.8112	Yes
Intellect $\to$ IQ	0.8838	0.0069	128.372	Yes
Motivation $\to$ Hrs_Train	0.9225	0.0294	31.3326	Yes
Motivation $\to$ Hrs_Work	0.9796	0.0251	39.0129	Yes
Job_Perform $\to$ Client_Sat	0.3760	0.2693	1.3962	No
Job_Perform $\to$ Super_Sat	1.0157	0.0802	12.6617	Yes
Job_Perform $\to$ Project_Compl	0.9887	0.1983	4.9868	Yes
Path Coefficient
Social $\to$ Job_Perform	0.5191	0.0259	20.0107	Yes
Intellect $\to$ Job_Perform	0.1113	0.0102	10.8679	Yes
Motivation $\to$ Job_Perform	0.8179	0.0131	60.4351	Yes
Segment 2
Outer Loading
Social $\to$ Psych_Test1	0.9566	0.0464	20.6368	Yes
Social $\to$ Psych_Test2	0.8473	0.0538	15.7623	Yes
Intellect $\to$ Years_Edu	0.8073	0.0086	93.8128	Yes
Intellect $\to$ IQ	0.8975	0.0069	130.3702	Yes
Motivation $\to$ Hrs_Train	0.9813	0.0294	33.331	Yes
Motivation $\to$ Hrs_Work	0.9294	0.0251	37.0146	Yes
Job_Perform $\to$ Client_Sat	0.9142	0.2693	3.3945	Yes
Job_Perform $\to$ Super_Sat	0.8554	0.0802	10.6634	Yes
Job_Perform $\to$ Project_Compl	0.5925	0.1983	2.9885	Yes
Path Coefficient
Social $\to$ Job_Perform	0.5710	0.0259	22.0090	Yes
Intellect $\to$ Job_Perform	0.1318	0.0102	12.8662	Yes
Motivation $\to$ Job_Perform	0.7917	0.0131	62.4351	Yes

Source: author’s own contribution

Table 5. The bootstrap standard error and critical ratio for outer loading and path coefficient using REBUS-PLS.

Segment	Loading/Path Coefficient	Bootstrap Standard Error	Bootstrap Critical Ratio	Significance
Segment 1
Outer Loading
Social $\to$ Psych_Test1	0.9166	0.0005	1667.0259	Yes
Social $\to$ Psych_Test2	0.8688	0.0002	4244.2827	Yes
Intellect $\to$ Years_Edu	0.7735	0.0275	28.1621	Yes
Intellect $\to$ IQ	0.9150	0.0267	34.3155	Yes
Motivation $\to$ Hrs_Train	0.9655	0.0007	1290.4154	Yes
Motivation $\to$ Hrs_Work	0.9763	0.0003	3749.5041	Yes
Job_Perform $\to$ Client_Sat	0.7645	0.0208	36.7085	Yes
Job_Perform $\to$ Super_Sat	0.9293	0.0002	4233.9560	Yes
Job_Perform $\to$ Project_Compl	0.9087	0.0094	96.7496	Yes
Path Coefficient
Social $\to$ Job_Perform	0.5218	0.0036	145.8676	Yes
Intellect $\to$ Job_Perform	0.1203	0.0029	40.8461	Yes
Motivation $\to$ Job_Perform	0.7400	0.0080	92.4914	Yes
Segment 2
Outer Loading
Social $\to$ Psych_Test1	0.9358	0.0003	2697.4705	Yes
Social $\to$ Psych_Test2	0.9059	0.0002	4067.5193	Yes
Intellect $\to$ Years_Edu	0.7835	0.0920	8.5134	Yes
Intellect $\to$ IQ	0.8469	0.0380	22.2688	Yes
Motivation $\to$ Hrs_Train	0.9258	0.0013	701.0146	Yes
Motivation $\to$ Hrs_Work	0.9415	0.0005	1928.7409	Yes
Job_Perform $\to$ Client_Sat	0.7110	0.0098	72.5408	Yes
Job_Perform $\to$ Super_Sat	0.9524	0.0003	3686.7002	Yes
Job_Perform $\to$ Project_Compl	0.6868	0.0113	61.0473	Yes
Path Coefficient
Social $\to$ Job_Perform	0.6659	0.0054	122.2898	Yes
Intellect $\to$ Job_Perform	0.0737	0.0066	11.1178	Yes
Motivation $\to$ Job_Perform	0.7141	0.0116	61.4591	Yes

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mukid, M.A.; Otok, B.W.; Suharsono, A. Segmentation in Structural Equation Modeling Using a Combination of Partial Least Squares and Modified Fuzzy Clustering. Symmetry 2022, 14, 2431. https://doi.org/10.3390/sym14112431

AMA Style

Mukid MA, Otok BW, Suharsono A. Segmentation in Structural Equation Modeling Using a Combination of Partial Least Squares and Modified Fuzzy Clustering. Symmetry. 2022; 14(11):2431. https://doi.org/10.3390/sym14112431

Chicago/Turabian Style

Mukid, Moch Abdul, Bambang Widjanarko Otok, and Agus Suharsono. 2022. "Segmentation in Structural Equation Modeling Using a Combination of Partial Least Squares and Modified Fuzzy Clustering" Symmetry 14, no. 11: 2431. https://doi.org/10.3390/sym14112431

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Segmentation in Structural Equation Modeling Using a Combination of Partial Least Squares and Modified Fuzzy Clustering

Abstract

1. Introduction

2. Partial Least Squares Method for Estimating SEM Parameters

3. Proposed Method

3.1. Segmentation in SEM Using a Combination of Partial Least Squares and Modified Fuzzy Clustering

3.2. Parameter Estimation of the Inner and Outer Model

3.3. Fuzzy Membership and PLSMFC Algorithm

4. Results and Discussions

4.1. Design of Simulation and Data Generating Process

4.2. Simulation Results

4.3. Application on Real Data

4.4. Future Research

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI