1. Introduction
The classical normal linear regression model (CNLRM) is useful for modelling data from a homogeneous population. However, heterogeneous populations, comprised of various sub-populations, are common in fields such as economics and environmental sciences, among others. To model data from such populations, Quandt [
1] and subsequently Goldfeld and Quandt [
2] introduced mixtures of NLRMs (MNLRMs). Estimation and inference of these models were studied by Hurn et al. [
3] and Frühwirth-Schnatter [
4] using Bayesian methods and DeSarbo and Cron [
5] and De Veaux [
6] using maximum likelihood via the expectation maximisation (EM) algorithm [
7]. A systematic study of these models can be found in [
4].
The MNLRMs work well when the component regression functions (CRFs) can be assumed to be linear parametric functions of the covariates. However, the latter assumption seldom holds for all covariates. The effect of one or more of the covariate(s) may be characterised by a non-parametric function. For this reason, Wu and Liu [
8] proposed the semi-parametric mixture of partial linear models (SPMPLMs), where each CRF is a linear combination of a parametric function of some of the covariates and a non-parametric function of one of the covariates. The model assumes that each component has a Gaussian distribution, and hence symmetric. The SPMPLMs combine the advantages of a parametric and a non-parametric function. In addition to providing a flexible approach for detecting unobserved regression relationships, this model alleviates the curse of dimensionality (COD) to a certain extent. Moreover, the fitted non-parametric functions can be used to suggest a suitable functional form for the covariate. However, estimation of this model poses a computational challenge. A local-likelihood approach to estimating the non-parametric functions is prone to experience a form of label switching. Maximising each local-likelihood function separately using the EM algorithm does not guarantee that the labels will be the same at each local point. This is because, at the E-step of the EM algorithm, each local point generates a unique set of responsibilities. The latter will be referred to as local responsibilities. The resulting estimated non-parametric functions are non-smooth and not useful in practice.
This problem was first mentioned by Huang and Yao [
9] in the context of estimating the MNLRMs where the mixing proportions are non-parametric functions of a covariate. This phenomena is akin to the label-switching problem that occurs when estimating Bayesian mixtures using MCMC procedures [
10].
To solve the label-switching problem, the EM algorithm must be modified. This is achieved by making use of the same responsibilities to maximise each local-likelihood function at the M-step of the EM algorithm [
11]. This implies that a global set of responsibilities must be obtained. Following this idea, Huang et al. [
11] proposed the effective EM algorithm for estimating the non-parametric mixture of normal regressions (NPMNRs), where the mixing proportions, variances and regression functions are non-parametric functions of a covariate. Xiang and Yao [
12] proposed a local EM-type and global EM-type algorithm for estimating the semi-parametric mixture of non-parametric regressions (SPMNPRs), where only the CRFs are non-parametric. To estimate the SPMPLMs, Wu and Liu [
8] proposed the profile likelihood EM (PL-EM) algorithm. Huang et al. [
13] proposed a local-likelihood estimation procedure via a modified EM algorithm to estimate the mixture of varying coefficient models. To estimate the non-parametric and parametric terms of the mixture of single index models (MSIMs) and the mixture of regression models with varying single index models (MRSIP), Xiang and Yao [
14] proposed a one-step backfitting estimation procedure via a modified EM algorithm. Zhang and Zheng [
15] and Zhang and Pan [
16] proposed a spline-backfitted kernel (SBK) estimation procedure via a modified EM algorithm to estimate the semi-parametric mixture of additive regressions (SPMARs) and the semi-parametric mixture of partially linear additive regressions (SPMPLARs). The above algorithms are local-likelihood based EM-type algorithms. More recently, Xue and Yao [
17] proposed a neural network-based EM-type algorithm to estimate the MNLRMs with non-parametrically covariate-varying mixing proportions. Xue [
18] also proposed a neural network EM-type algorithm to estimate a new form of a mixture of experts (MoE) model [
19] with non-parametric CRFs.
In each of the above-mentioned local-likelihood based fitting algorithms, the global responsibilities are calculated without taking into account the information from the local responsibilities. To estimate the NPMNRs, Skhosana et al. [
20] proposed a novel EM-type algorithm that takes into account the local information. A simulation study was used to show that the performance of the algorithm is at least as good as that of the effective EM algorithm.
Motivated by the practical importance of the SPMPLMs, as mentioned above, the present paper is concerned with estimating this model. Our first research question was: how can we obtain smooth estimates of the non-parametric functions in the presence of label switching? Our response to this question follows the same ideas as in [
20]. Speckman [
21] showed that a less biased and more efficient estimator of the parametric term of the CRF can be obtained via partial residuals. However, this requires the computation of an
smoother matrix, where
n is the sample size. This can be computationally costly for a large
n, especially when applied within an iterative algorithm. Our second research question was: in an effort to reduce the computational burden imposed by the partial-residuals estimator, is there an estimator that can perform at least as well as the former estimator? In answer to the above questions, the current study makes the following contributions to estimation and computation for the SPMPLMs. Firstly, we propose a one-step backfitting EM-type algorithm to address the label-switching problem. The proposed algorithm estimates the non-parametric term as follows:
At the E-step, we calculate the local responsibilities over each point in the set of grid points;
At the M-step, we simultaneously maximise all the local-likelihood functions using each set of local responsibilities in turn. In other words, for each set of local responsibilities, we will have an estimate of the non-parametric term.
Among the estimates obtained in step 1, we choose the smoothest estimate as the final estimate of the non-parametric term.
We repeat the above three steps until convergence. Steps 1 and 2 are the regular E- and M-steps of the EM algorithm. Step 3 can be seen as a smoothing step. Indeed, at each iteration, the algorithm can be viewed as removing the roughness (or wiggliness) from the non-parametric term due to the effect of label switching. Using the resulting estimate of the non-parametric term, the algorithm continues by estimating the parametric term. Finally, we use the latter to improve the estimate of the non-parametric term.
Secondly, we propose a regression spline-based estimation procedure for the SPMPLMs. This approach does not use local-likelihood estimation, and thus it is free from the label-switching problem. Simulation results show that this approach performs at least as well as the profile likelihood approach in estimating the parametric term. However, the former approach performs better than the latter approach in estimating the non-parametric term. To demonstrate the practical use of the proposed methods, we consider an analysis of two climate datasets.
Lastly, to reduce the computational burden imposed by the partial-residuals estimator of the parametric part of the CRF, we propose a plug-in estimator.
The rest of the paper is organised as follows. In
Section 2, we give a brief definition of the SPMPLMs. In
Section 3, we discuss likelihood estimation for the SPMPLMs. We present two approaches, regression spline-based estimation and profile likelihood-based estimation. For the latter, we highlight the label-switching problem and then present the proposed solution.
Section 4 presents a data-driven method used to select the appropriate smoothing parameter.
Section 5 and
Section 6 present the results of our simulation studies and an test on two climate datasets, respectively.
Section 7 concludes and gives directions for future research.
2. Model Definition
Let
be a set of random variables from a population consisting of
K sub-populations. We assume throughout the paper that both
Y and
T are univariate and
is
dimensional. Let
be a latent variable. For
,
has a discrete distribution
, where
and
. Conditional on
,
and
,
Y follows a Gaussian distribution with CRF
and variance
. The function
is assumed to be a smooth unknown function of the covariate
t and
is a
dimensional vector of regression coefficients. Conditional on
and
,
Y follows a mixture of Gaussians
Note that, in order to ensure identifiability,
does not include the intercept term. In model (
1), the CRF,
, is semi-parametric, being a linear combination of a parametric term (
) and a non-parametric (
) term. In effect, the
p covariates
are assumed linearly related to
y, whereas the relationship between
t and
y is assumed to be characterised by a non-parametric function. The component variances and mixing proportions are assumed to be constant and thus parametric. For
, model (
1) is a partial linear regression model. For
constant, model (
1) is a mixture of linear regressions model. Thus, model (
1) is a natural extension of the partial linear model and the mixture of linear regressions model.
3. Model Estimation
In order to estimate model (
1), we must estimate both the parametric (
) and the non-parametric (
) terms of the model. A likelihood approach to estimation is adopted for this purpose. Let
be a vector of the model parameters and non-parametric functions, where
,
,
and
with
. For a random sample
generated from model (
1), the log-likelihood function is given by
Due to the presence of the non-parametric function in (
2), it is not straightforward to optimise the log likelihood function to obtain the maximum likelihood estimates. In the following, we discuss two likelihood-based approaches to estimate both parametric and non-parametric terms of model (
1).
3.1. Regression Spline-Based Estimation
We begin by discussing a regression spline-based estimator. A regression spline is a non-parametric estimator that works by parametrising a function using a set of piecewise polynomial functions joined at a set of points, or knots, in the domain of the function (see Wu and Zhang [
22] and James et al. [
23] for more details). We make use of a cubic spline. The non-parametric function
can be written as
where
J is the number of internal knots, the
s are the basis functions and the
s are the coefficients for the
component. We make use of the B-spline basis functions, mainly because of their numerical stability (see Fan and Gijbels [
24]).
After substituting (
3) into (
2), the log likelihood function can be written as
where
is a vector of all the model’s parameters, where
with
. Note that the regression function is now completely a linear combination of parametric terms, where
and
are both regression coefficients. To optimise the log likelihood function (
4), we make use of the EM algorithm. Towards that end, define the latent variable as
Thus, the complete data are given by , where .
The corresponding complete-data log likelihood is given by
Estimation Algorithm
Since the
s are latent variables, we maximise the conditional expected value of (
6). Towards that end, at the
iteration of the EM algorithm, in the E-step we estimate the
s using the conditional expected values given by
In the M-step, we update the
by maximising the following conditional expected log likelihood:
Maximising
with respect to
,
,
and
, respectively, yields
where
,
,
and
.
Upon convergence of the above EM algorithm, the estimate of
is given by the vector
, whence the estimator of the non-parametric function,
, can be obtained as
Consequently, .
3.2. Profile Likelihood Estimation
We now discuss a profile likelihood approach to the estimation of model (
1). When estimating a semi-parametric model, the profile likelihood approach proceeds in two-steps, estimating each term in turn. First, the non-parametric function (
) is profiled-out as a nuisance parameter. That is, for a fixed value of the parametric term, an estimate of the function
, referred to as a least favourable curve (denoted by
), is obtained. The profile likelihood is then derived as
where
with
. Note that the log likelihood function in Equation (
13) is the same as (
2), where
is substituted for
. To estimate the parametric term, the profile likelihood function (
13) is maximised to obtain the maximum likelihood estimates (see Severini and Wong [
25] for more details).
We make use of the local-likelihood kernel approach [
26] to estimate the non-parametric function
. Let
be a set of
N grid points in the domain of the covariate
t. For a random sample
from model (
1), the log of the local-likelihood function for the non-parametric function
is defined as
where
. By maximising (
14), we obtain the local-likelihood estimate of
, denoted
, for each
. We make use of the local mean estimator (see Fan and Gijbels [
24] for more details about this estimator). By linearly interpolating
for
, we can obtain
for
. By substituting the latter into (
13), we obtain the profile likelihood function, whence we can derive the maximum likelihood estimators of the parameters
,
and
. For a given random sample, the estimates of the latter parameters, together with the non-parametric estimate
, are referred to as profile likelihood estimates. In the following section, we discuss how the above profile likelihood estimation procedure is implemented.
3.2.1. Local-Likelihood EM (LL-EM) Algorithm
In this section, we present a naive implementation of the above profile likelihood estimation procedure. The EM algorithm is applied separately to maximise each of the
N local-likelihood functions (
14) followed by another separate application of the algorithm to the maximisation of the profile likelihood function (
13). As we demonstrate below, this approach is subject to the label-switching problem, as mentioned in [
11] and comprehensively discussed in [
20]. Let
be the complete data where
is the latent indicator variable, as defined in (
5). The log of the complete local-likelihood function is given by
For each grid point
u, we maximise (
15) using the EM algorithm. In the E-step of the
iteration, we estimate the latent variable
using the responsibilities
In the M-step, we update
, for
, by maximising the conditional expectation of (
15) given by
Maximising
with respect to
, for each
, yields the following estimator:
To obtain , we interpolate over .
Let
be the estimate at convergence of the above EM algorithm. Given
, the complete-data profile log-likelihood function corresponding to (
13) is given by
where
. In the E-step, we estimate the latent variable by the responsibilities
from where we derive the expected value of (
19) as
In the M-step, we maximise
to update
. The EM update equations for
,
and
are, respectively, given as
where
,
,
and
, for
, is a kernel smoother matrix with elements given by
Note that each complete-data local likelihood (
15) has its own set of responsibilities
, for each
. This is a possible source of label switching, as the component labels based on the
s at each
are not guaranteed to be aligned. In our simulations using this naive approach, we can observe non-smooth estimates of the non-parametric function. Thus, the approach is quite sensitive to label switching. The solution to this problem is to make use of the same responsibilities to maximise each local-likelihood function. The latter responsibilities can be used to maximise (
17) at each
. The objective is to obtain this unique set of responsibilities (referred to as the global responsibilities). Using this idea, Wu and Liu [
8] proposed a modified EM (PL-EM) algorithm to address this problem. Their algorithm simultaneously maximises (
17) and (
21). In the following, we propose a novel approach in which the local responsibilities at each grid point
s are used to calculate the global responsibilities. The latter are chosen from the former by imposing a smoothness constraint on the estimated non-parametric function. Our approach here follows the same idea used in [
20].
3.2.2. One-Step Backfitting EM (OB-PL-EM) Algorithm
We now propose a one-step backfitting profile likelihood EM (OB-PL-EM) algorithm to implement the profile likelihood estimation procedure outlined above while addressing the label-switching problem. The algorithm proceeds in three steps. In the first step, we use Algorithm 2 in [
20] to obtain the least favourable curve,
, for
.
Let and be the resulting estimates of and the corresponding set of local responsibilities, obtained at the local point , respectively. The latter will be used as the global responsibilities in what follows.
In the second step, we use the EM algorithm to obtain the estimate of
. Use the estimated global responsibilities
as the initial responsibilities at the E-step. Calculate the smoother matrix
, for
as in (
25) using
, for each
. Given
and the kernel smoother matrix
, at the M-step, maximise (
19) to estimate the elements of the parameter vector
. The above E- and M-steps are repeated until convergence. We denote
as the estimate of
obtained from above.
Note that the estimate
is based on a set of local responsibilities
. At the third step of the algorithm, we improve the efficiency of this estimate by maximising the following local log-likelihood function:
where
,
and
are replaced with
,
and
in (
14), respectively. To maximise (
26), we alternate the following E- and M-steps until convergence. Using
and
to initialise the algorithm, in the E-step, calculate the responsibilities as
In the M-step, update the
by maximising
using the responsibilities in (
27) for all
. The E- and M-steps are repeated until convergence.
Denote as the resulting estimate. Let . We refer to as the one-step backfitting profile likelihood estimator.
A summary of the above one-step backfitting PL-EM algorithm is given in Algorithm 1.
Remark 1. Note that each stage of Algorithm 1 consists of performing a regular EM algorithm. It follows that the desired ascent property of the algorithm is achieved at each stage.
Remark 2. It is interesting to note at this point that if we incorporate the proposed regression spline estimation in the first stage of Algorithm 1, the proposed estimation procedure is similar to the SBK method of Zhang and Pan [16]. Thus, the results on the asymptotic behaviour of the resulting estimators can be assumed to follow from those presented in [16]. This is also supported by the finite sample performance of from our simulations. Remark 3. The second stage of the algorithm becomes computationally intensive as the sample size n increases, since the number of grid points N must be equal to n. This is because the estimator requires the smoother matrix to be an matrix. For large sample sizes ( 10,000 true for big datasets), it may be impossible to run the algorithm in a reasonable time. This is also true for the PL-EM algorithm proposed by Wu and Liu [8]. In the following section, we propose an alternative estimation procedure that does not require the computation of the smoother matrix.
Algorithm 1 One-step backfitting PL-EM (OB-PL-EM) algorithm for fitting model (1) |
Step 1: Estimating the non-parametric function: Repeat Steps 1–3 of Algorithm 2 in [ 20] until convergence to obtain and . Step 2: Estimating the parameters: , and Initialisation: Given , let , and be the initial states of the parameters. E-step: Calculate the responsibilities using ( 20). M-step: Update the parameters , and using ( 22)–( 24). Iteration: Repeat the E- and M-step until convergence.
Step 3: Re-estimate the non-parametric function: |
3.2.3. Profile Likelihood Plug-in EM (PL-p-EM) Algorithm
To reduce the computational burden imposed by the OB-PL-EM estimation procedure, we propose an alternative estimation procedure. To estimate , we make use of a plug-in estimator which does not require the computation of the smoother matrix.
Let
be an estimate of
obtained as in the OB-PL-EM or PL-EM estimation procedure. To estimate
, we maximise the following profile log likelihood function:
Let
be the estimator obtained from maximising (
28).
The fitting algorithm to perform the above procedure proceeds in two steps. In the first step, in the
iteration, we obtain
as in the OB-PL-EM algorithm or PL-EM algorithm. Let
be the resulting global responsibilities. In the second step, we update
by maximising the expected complete-data version of (
28) using the global responsibilities. The resulting update equations for
and
, for
, are the same as (
22) and (
24), respectively. The resulting update equation for
, for
, is given by
From a comparison of (
23) and (
29), it is not hard to see why we refer to
as a plug-in estimator. The asymptotic behaviour of
has not yet been studied. However, we are encouraged by its finite sample performance presented in our simulations.
Let
be the resulting estimator of
. We refer to
as the profile likelihood plug-in estimator. The above estimation procedure is summarised in Algorithm 2.
Algorithm 2 Profile likelihood plug-in EM-type (PL-p-EM) algorithm for fitting model (1) |
Initialisation: Let and be the initial parameter vector and non-parametric function. These can be obtained using the proposed regression-spline based estimator, for instance. Step 1: At the iteration, obtain and the resulting global responsibilities , for and . This can be done as in the OB-PL-EM or PL-EM algorithm for . Step 2: Using the global responsibilities, update , and using ( 29), ( 22) and ( 24), respectively. Iteration: Repeat Steps 1 and 2 until convergence.
|
4. Choosing the Bandwidth
To estimate the non-parametric function
, we need to choose an appropriate value for the smoothing parameter,
h. In practice, this is usually data-dependent based on the cross-validation (CV) or generalised CV (GCV). For estimating model (
1), Wu and Liu [
8] proposed a multi-fold CV approach to choose
h. For the
case, Speckman [
21] proposed a GCV approach to choose
h and provided theoretical evidence to support its application. As for its simplicity, we also propose a GCV method to choose
h for estimating model (
1). GCV provides a data-based estimate of
h in order to minimise the following unobservable mean squared error (MSE):
where
and
are the regression function and its estimator for the
component.
In matrix notation,
can be expressed as
where
is a linear smoother matrix (see Buja et al. [
27] for more details on linear smoothers). We here define the GCV function as
where
denote the degrees of freedom and the
fitted value, respectively. In analogy with parametric regression, df represents the effective number of parameters used to estimate the regression function. The GCV criteria selects the bandwidth that minimises GCV
.
5. Simulations
We performed an extensive simulation study to demonstrate the finite sample performance of the methods proposed in this paper. Throughout our simulations, we considered a
component mixture environment and a univariate
(that is,
), denoted
x. We compare the performance of the proposed procedure (OB-PL-EM) with that of the PL-EM procedure proposed by Wu and Liu [
8]. To initialise the algorithms, we made use of the regression spline-based estimator (R-spline-EM) with
internal knots chosen to be the 1st, 2nd and 3rd quartiles of the covariate
t. In order to improve the stability of the model estimate and alleviate the issue of the dependence on the initial solution, we made use of the following initialisation strategy: Fit a mixture of regression splines for a 100 times from random starts and choose as the initial solution the model with the smallest BIC. We also show the results obtained using the plug-in (PL-p-EM) procedure. The algorithms were implemented using the R programming language (version 3.6.1 released 2019-07-05 [
28]). The
bs function from the splines R package was used within the R-spline-EM function to compute the basis functions. Throughout our simulations, the two covariates,
x and
t, were generated from a uniform distribution on the interval
. We generated 500 samples of sizes
, 400, 800 and 1000. We made use of the Epanechnikov kernel function and
grid points. The set of grid points
was chosen uniformly from the domain of the covariate
t.
5.1. Performance Assessment
To assess the performance of the estimator
or
, we make use of the root of the average squared errors (RASE):
For the parametric estimators, we made use of the MSE:
and Bias
5.2. Simulation Study
The first aim of this simulation study was to illustrate the performance of the proposed one-step backfitting profile likelihood estimators and the profile likelihood plug-in estimators. The data used in this example were generated from model (
1) using the
component setting given in
Table 1.
To show the effectiveness of the proposed estimation procedure, we considered the following bandwidths:
,
and
corresponding to under-smoothing (US), appropriate smoothing (AS) and over-smoothing (OS), respectively, where
denotes the bandwidth selected by the GCV method.
Table 2,
Table 3,
Table 4 and
Table 5 reports the averages and standard deviations of the performance measures. The results show that the proposed estimation procedures have generally good performance. The proposed profile likelihood-based estimation procedures provided similar results to the PL-EM estimation procedure under all three bandwidths. The regression spline-based estimation procedure performed similarly to the profile likelihood procedure for estimating the parametric part. However, the R-spline-EM procedure generally has better performance when estimating the non-parametric part. This shows that it is a good choice for initialising the profile likelihood-based procedures.
Figure 1 presents the non-parametric function estimates based on the profile likelihood-based estimators for a typical sample of size
. Included in
Figure 1 are also the 95% point-wise bootstrap confidence intervals (CI). Based on the fitted model
, for each (
), generate
for all
. Let
be the bootstrap sample obtained in the above manner. We repeated this process 1000 times. The component function estimates are virtually the same. Noticeably, the 95% point-wise bootstrap CIs based on the parametric estimator
do not contain the points at the boundaries. This might indicate that the non-parametric estimator based on this parametric estimator has boundary bias. This in turn gives support for the usefulness of the plug-in estimator
.
In terms of computational time, the average run times (in seconds) of the PL-p-EM and PL-EM algorithms were 1.67 and 7.65, 6.14 and 31.49 and 7.7 and 54.65 for , 800 and 1000, respectively.
The second aim of this simulation study was to demonstrate the effectiveness of the proposed GCV method for smoothing parameter selection. We used the same sampling settings as in
Table 1. We generated 500 samples, and for each sample we obtained
over a reasonable range of bandwidths. We then randomly split the 500 selected bandwidths into 10 groups, each consisting of 50.
Table 6 reports the average and standard deviation for each of the 10 groups for sample sizes
and 800. We can see that the method is consistent, as the variation between the selected bandwidth from one group to another is small. This in turn shows the effectiveness of the method. The last column gives the average and standard deviation of the 10 average bandwidths for each sample size.
7. Discussion and Conclusions
In this section, we give a summary of the paper and a brief discussion of the results. Moreover, we also provide directions for future research.
7.1. Summary
This paper considered maximum likelihood estimation of the semi-parametric mixture of partial linear models (SPMPLMs) via the EM algorithm. The mixture component regression function (CRF) of each linear model consists of a parametric and non-parametric term. We considered both global and local estimation of the non-parametric term. For the former, we proposed a regression spline-based estimation procedure. For the latter, we first identified the label switching problem involved when separately maximising each local-likelihood function. The general solution to this problem is to simultaneously maximise each local-likelihood function using the same responsibilities obtained at the E-step of the EM algorithm. Thus, a global set of responsibilities must be obtained. The proposed one-step backfitting profile likelihood estimation procedure makes use of the local responsibilities to compute global responsibilities.
In addition, the non-parametric estimator requires a smoothing parameter. We proposed a data-driven approach using the GCV method to select this parameter. To reduce the computational burden imposed by the partial residuals estimator of the parametric term of the CRF, we proposed a plug-in estimator.
7.2. Discussion of the Results
We demonstrated the performances of the proposed methods through a simulation study. Based on the results, the proposed methods achieved accurate estimation of both the parametric and non-parametric terms of the model. Moreover, the latter performed at least as well as the competitive methods. In general, the regression spline-based estimator performs better than the profile likelihood estimators for estimating the non-parametric term. Furthermore, the non-parametric estimator based on the plug-in estimator performs better than that based on the partial residuals estimator. In terms of computational time, the plug-in procedure reduces the computational burden drastically.
To illustrate the practical use of the proposed methods, we used them to estimate the SPMPLMs for two climate datasets. For the first dataset, we considered the effects of urbanisation and energy consumption per capita on CO
emissions. The estimated model identified clearly a mixture structure consisting of two groups of countries. For the first group (top of
Figure 3b), carbon emissions increase rapidly as more people move into the urban areas and thereafter declines slowly. For the second group (bottom of
Figure 3b), carbon emissions increase slowly as urbanisation increases and then rapidly declines for further increases in urbanisation.
For the second dataset, we considered the effect of GDP per capita and the share of total energy from renewable sources on CO
emissions. We proposed a
component SPMPLMs for this data. The estimated model revealed two groups of countries. For the first group (see
Figure 5b), carbon emissions decrease rapidly as per capita GDP increases, followed by an increase and thereafter a further decrease in CO
with further increases in per capita GDP. For the second group (see
Figure 5c), carbon emissions increase up to a point and then decrease, followed by another increase and then a decrease.
For both datasets, the resulting functions exhibit the environmental Kuznets curve (EKC) hypothesis, which says that carbon emissions increase up to a point of national income and decrease beyond that point.
7.3. Future Work
For future studies, it will be of interest to adapt the proposed ideas to estimate more flexible models where the CRFs are comprised of additive non-parametric functions. The proposed algorithm computes the global estimate of the non-parametric function by selecting the smoothest locally estimated function. This procedure is discrete in that it depends on a single set of local responsibilities. Efforts are now underway to develop methods that continuously combine all the sets of local responsibilities to compute the global responsibilities. Note that the proposed algorithm can also be applied to address label switching when estimating parametric Bayesian mixtures using MCMC procedures. However, this version of the algorithm has factorial complexity, growing with the size of the Markov chain. This gives rise to a further line of research to develop a computationally efficient procedure.