Next Article in Journal
Landauer Bound in the Context of Minimal Physical Principles: Meaning, Experimental Verification, Controversies and Perspectives
Previous Article in Journal
Optimal Decoding Order and Power Allocation for Sum Throughput Maximization in Downlink NOMA Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Model Selection for Exponential Power Mixture Regression Models

Department of Statistics and Data Science, College of Economics, Jinan University, Guangzhou 510632, China
*
Author to whom correspondence should be addressed.
Entropy 2024, 26(5), 422; https://doi.org/10.3390/e26050422
Submission received: 27 February 2024 / Revised: 24 April 2024 / Accepted: 14 May 2024 / Published: 15 May 2024
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
Finite mixture of linear regression (FMLR) models are among the most exemplary statistical tools to deal with various heterogeneous data. In this paper, we introduce a new procedure to simultaneously determine the number of components and perform variable selection for the different regressions for FMLR models via an exponential power error distribution, which includes normal distributions and Laplace distributions as special cases. Under some regularity conditions, the consistency of order selection and the consistency of variable selection are established, and the asymptotic normality for the estimators of non-zero parameters is investigated. In addition, an efficient modified expectation-maximization (EM) algorithm and a majorization-maximization (MM) algorithm are proposed to implement the proposed optimization problem. Furthermore, we use the numerical simulations to demonstrate the finite sample performance of the proposed methodology. Finally, we apply the proposed approach to analyze a baseball salary data set. Results indicate that our proposed method obtains a smaller BIC value than the existing method.

1. Introduction

FMLR models are among the most exemplary statistical tools to deal with various heterogeneous data. Since FMLR models were first introduced by [1,2], they are widely applied in many research fields, e.g., machine learning [3], social sciences [4], and business [5]. For more references to FMLR models, see [6,7,8].
There are two important statistical problems in FMLR models: order selection and variable selection for the different regressions. However, order selection should be the first discussed issue in FMLR models. There exists a lot of literature to deal with this problem. For example, Ref. [9] introduced a penalized likelihood method for mixtures of univariate location distributions. Ref. [10] proposed a penalized likelihood method to select the number of mixing components for the finite multivariate Gaussian mixture models. For variable selection problems for each regression component, Ref. [11] applied subsect selection, REDapproaches such as Akaike information criterion (AIC) and Bayesian information criterion (BIC) to perform a variable selection for each component in a finite mixture of Poisson regression models. To avoid the drawbacks of subsect selection, Ref. [12] introduced a penalized likelihood method for variable selection in FMLR models. Ref. [13] proposed a robust variable selection procedure to estimate and select relevant covariates for FMLR models.
The above-proposed methods do not jointly select the order selection and significant variables in FMLR models. In fact, it is a challenging issue, although some literature exists to solve this problem. Ref. [14] introduced MR-Lasso for FMLR models to simultaneously identify the order selection and significant variables. However, they do not study the large sample properties of the proposed method. Ref. [15] proposed a robust mixture regression estimator via an asymmetric exponential power distribution and [16] studied component selection for exponential power mixture models, while they did not consider the variable selection procedure. Ref. [17] applied the penalized method on the number of components and regression coefficients to conduct model selection for FMLR models, but the error followed a normal distribution. Therefore, the proposed method is very sensitive to the heavy-tailed distribution.
In this paper, motivated by [10,18], we propose a new model selection procedure for the FMLR models via an exponential power distribution, which includes normal distributions and Laplace distributions as special cases. Under some regularity conditions, we investigate the asymptotic properties of the proposed method. In addition, we introduce an expectation-maximization (EM) algorithm [19] and a majorization-maximization (MM) algorithm [20] to solve the proposed optimization problem. The finite sample performance of the proposed method is illustrated via some numerical simulations. Results indicate that the proposed method is more robust to the heavy-tailed distributions than the existing method.
The rest of this paper is organized as follows. In Section 2, we present the finite mixture of regression models with an exponential power distribution and a penalized likelihood-based model selection approach. The asymptotic properties of the resulting estimates are investigated. In Section 3, a modified EM algorithm and an MM algorithm are developed to maximize the penalized likelihood. In Section 4, we propose a data-driven procedure to select the tuning parameters. In Section 5, simulation studies are conducted to evaluate the finite sample performance of the proposed method. In Section 6, a real data set is analyzed to compare the proposed test with some existing methods. We conclude with some remarks in Section 7. Technical conditions and proofs are given in the Appendix A.

2. Methodology

The density function of an exponential power (EP) distribution is defined as follows:
f p ( x ; 0 , σ ) = p Γ ( 1 p ) 2 1 + 1 p σ exp 1 2 | x σ | p ,
where p > 0 , σ > 0 is the scale parameter, and Γ ( · ) is the Gamma function. When 0 < p < 2 , the EP distribution is heavy-tailed, which indicates that it can provide protection against outliers. The EP density function is a flexible and general density function class, and includes some important statistical density functions as its special cases, e.g., Gaussian density function ( p = 2 ), and Laplace density function ( p = 1 ). Meanwhile, the EP distribution has a wide range of applications, particularly in the area of business applications [21].
Based on the EP density function, we study the FMLR models. Let Z be a latent class variable with P ( Z = j | x ) = π j for j = 1 , 2 , , m , where X is a p-dimensional vector. Given Z = j , suppose that the response Y depends on X in a linear way
Y = X T β j + ϵ j ,
where β j is a p-dimensional vector, and ϵ j is a random error with an EP density function f p j ( x ; 0 , σ j ) . Then the conditional density of Y given X can be written as
f ( y | x ) = j = 1 m π j f p j ( Y X T β j ; 0 , σ j ) .
Let { ( X 1 , Y 1 ) , , ( X n , Y n ) } be a random sample from (1). Then, the log-likelihood function for observations { ( X 1 , Y 1 ) , , ( X n , Y n ) } is given by
Q n ( θ ) = i = 1 n log j = 1 m π j f p j ( Y i X i T β j ; 0 , σ j ) ,
where θ = ( β 11 , , β 1 p , , β m 1 , , β m p , σ 1 , , σ m , p 1 , , p m , π 1 , , π m 1 ) .
To deal with the model selection problem, according to [10], we consider the following objective function,
Q ˜ n ( θ ) = Q n ( θ ) P n 1 ( θ ) P n 2 ( θ )
with the penalty function
P n 1 ( θ ) = n j = 1 m t = 1 p p λ 1 ( | β j t | ) ,
P n 2 ( θ ) = n λ 2 j = 1 m log ( ϵ + p λ 2 ( π j ) ) log ( ϵ ) ,
where p λ ( · ) is a non-negative and non-decreasing function, and λ 1 > 0 and λ 2 > 0 are two penalized parameters. Thus, we can obtain the estimators θ ^ n of θ as follows
θ ^ n = arg max θ Q ˜ n ( θ ) .
To derive some theoretical properties of the estimators θ ^ n , we first define
a n = max j , t { p λ 1 ( β j t 0 ) / n , p λ 2 ( π j 0 ) / n : β j t 0 0 , π j 0 0 } ,
b n = max j , t { p λ 1 ( β j t 0 ) / n , p λ 2 ( π j 0 ) / n : β j t 0 0 , π j 0 0 } ,
where p λ ( h ) and p λ ( h ) are the first and second derivatives of the function p λ ( h ) with respect to h. To establish the asymptotic properties of the proposed estimators, we assume the following regularity conditions:
(C1)
For any λ , p λ ( 0 ) = 0 , and p λ ( · ) is non-negative and symmetric. Furthermore, it is non-decreasing and twice differentiable in ( 0 , ) with at most a few exceptions.
(C2)
As n , b n = o ( 1 ) .
(C3)
lim n inf 0 < h n 1 / 2 log n n p λ ( h ) = .
(C4)
The joint density f ( z , θ ) of Z = ( X , Y ) have the third partial derivatives with respect to θ for almost all z .
(C5)
For each θ 0 , there exists R 1 ( z ) and R 2 ( z ) such that for θ in a neighborhood N ( θ 0 ) of θ 0 ,
f ( z ; θ ) θ i R 1 ( z ) , 2 f ( z ; θ ) θ i θ j R 1 ( z ) , 3 f ( z ; θ ) θ i θ j θ k R 2 ( z ) ,
where θ 0 is the true parameter, R 1 ( z ) and R 2 ( z ) satisfy R 1 ( z ) d z < , and R 2 ( z ) f ( z ; θ ) d z < .
(C6)
The Fisher information matrix Q ( θ ) is finite and positive definite at θ = θ 0 , where Q ( θ ) is defined as follows,
Q ( θ ) = E θ log ( f ( Z ; θ ) ) θ log ( f ( Z ; θ ) ) T .
(C7)
p j > 1 , j = 1 , , m .
(C8)
c 1 σ j 2 c 2 , β j c 3 , j = 1 , , m , where c 1 is some positive constant, c 2 and c 3 are some large constants.
Remark 1.
Conditions C1–C3 are the assumption about the penalty function, and assure that the variable selection of the proposed estimators is consistent. The similar conditions are also used in [22]. Condition (C5) ensures that the main term dominates the remainder in the Taylor expansion. Conditions (C4)–(C6) are used in [17]. Condition (C7) ensures the concavity of the likelihood function since the log likelihood function of random sample from EP distribution is concave if p > 1 . Condition (C8) ensures the compactness of parameter space. Conditions (C7) and (C8) are similarly applied in Wang and Feng [16].
In the following, we have two theorems with proofs given in the Appendix A.
Theorem 1.
Under the conditions (C1), (C2), (C4)–(C8), and if n min { λ 1 , λ 2 } , and min { λ 1 , λ 2 } 0 , then there exists a local maximizer θ ^ n of the penalized log-likelihood function (2) such that
θ ^ n θ 0 = O p ( n 1 / 2 ) .
Theorem 2.
Under the conditions (C1)–(C8), and if n min { λ 1 , λ 2 } , and min { λ 1 , λ 2 } 0 . Then, for any n -consistent estimator θ ^ n of θ , we have
(a)
Sparsity: P { π ^ k = 0 } 1 as n , where k = m 0 + 1 , , m .
(b)
Sparsity: P { β ^ k j = 0 } 1 as n , where k = 1 , , m 0 and j = 1 , , t k .
(c)
Asymptotic normality:
n Q 1 ( θ 01 ) P n 1 ( θ 01 ) n P n 2 ( θ 01 ) n ( θ ^ n 1 θ 01 ) + P n 1 ( θ 01 ) n + P n 2 ( θ 01 ) n D N ( 0 , Q 1 ( θ 01 ) ) ,
where m 0 is the number of true non-zero mixing weights, θ 01 and Q 1 ( θ 01 ) are the true parameter and the corresponding Fisher information when all zero effects are removed, respectively.

3. Algorithm

In this section, we apply a modified EM algorithm and an MM algorithm to solve the proposed optimization problem (3). Let z i j be the indicator variables that show if the i-th observation arises from the j-th component as missing data, and p i j is the posterior probability that the i-th observation belongs to the j-th component. Therefore, the expected complete-data log-likelihood function is given as follows:
i = 1 n j = 1 m z i j log π j f p j ( Y i X i T β j ; 0 , σ j ) .
Then, the objective function (2) is rewritten as
i = 1 n j = 1 m p i j log π j f p j ( Y i X i T β j ; 0 , σ j ) P n 1 ( θ ) P n 2 ( θ ) .
Next, we apply a modified EM algorithm to maximize the objective function (4). The detailed procedure is given as follows:
Step 1 Given the l-th approximation
θ ^ ( l ) = ( β ^ 11 ( l ) , , β ^ 1 p ( l ) , , β ^ m 1 ( l ) , , β ^ m p ( l ) , σ ^ 1 ( l ) , , σ ^ m ( l ) , p ^ 1 ( l ) , , p ^ m ( l ) , π ^ 1 ( l ) , , π ^ m 1 ( l ) ) ,
we can calculate the classification probabilities:
p ^ i j ( l + 1 ) = π ^ j ( l ) f p ^ j ( l ) ( Y i X i T β ^ j ( l ) ; 0 , σ ^ j ( l ) ) j = 1 m π ^ j ( l ) f p ^ j ( l ) ( Y i X i T β ^ j ( l ) ; 0 , σ ^ j ( l ) ) .
Step 2 We first update { π 1 , , π m } . We use a Lagrange multiplier δ to take into account for the constraint j = 1 m π j = 1 , then we have
π j i = 1 n j = 1 m p ^ i j ( l + 1 ) log ( π j ) n λ 2 j = 1 m log ϵ + p λ 2 ( π j ) δ ( j = 1 m π j 1 ) = 0 .
In (5), we apply the local linear approximation [23] to log ϵ + p λ 2 ( π j ) ,
log ϵ + p λ 2 ( π j ) log ϵ + p λ 2 ( π ^ j ( l ) ) + p λ 2 ( π ^ j ( l ) ) ϵ + p λ 2 ( π ^ j ( l ) ) ( π j π ^ j ( l ) ) .
Then, π j can be updated by straightforward calculations,
π ^ j ( l + 1 ) = 1 D j i = 1 n p ^ i j ( l + 1 ) ,
where
D j = n 1 λ 2 j = 1 m π ^ j ( l ) p λ 2 ( π ^ j ( l ) ) ϵ + p λ 2 ( π ^ j ( l ) ) + λ 2 p λ 2 ( π ^ j ( l ) ) ϵ + p λ 2 ( π ^ j ( l ) ) .
Next, we update { β 11 , , β 1 p , , β m 1 , , β m p , σ 1 , , σ m , p 1 , , p m } by maximizing the following objective function,
i = 1 n j = 1 m p ^ i j ( l + 1 ) log π ^ j ( l + 1 ) f p j ( Y i X i T β j ; 0 , σ j ) n j = 1 m t = 1 p p λ 1 ( | β j t | ) .
We first update σ 1 , , σ m . For each σ j , j = 1 , 2 , , m , we only need to maximize
i = 1 n p ^ i j ( l + 1 ) log ( σ j ) 1 2 | Y i X i T β ^ j ( l ) σ j | p ^ j ( l ) .
Then, the resulting estimator is given as follows:
σ ^ j ( l + 1 ) = i = 1 n p ^ i j ( l + 1 ) | Y i X i β ^ j ( l ) | p ^ j ( l ) 2 i = 1 n p ^ i j ( l + 1 ) .
Next, we update p 1 , , p m . For each p j , j = 1 , 2 , m , according to the condition (C7), we have
p ^ j ( l + 1 ) = arg max p j > 1 i = 1 n p ^ i j ( l + 1 ) log ( p j ) l o g ( Γ ( 1 p j ) ( 1 + 1 p j ) log ( 2 ) 1 2 Y i X i T β j ( l ) σ ^ j ( l + 1 ) p j .
Finally, we update β 1 , , β m . By ignoring some terms which do not involve in β j , we have
L ( β j ) = i = 1 n p ^ i j ( l + 1 ) 1 2 σ ^ j ( l + 1 ) | Y i X i T β j | p ^ j ( l + 1 ) n t = 1 p p λ 1 ( | β j t | ) .
By using a MM algorithm for L ( β j ) ’s the first term, we have
( Y i X i T β j ) T ( Y i X i T β j ) p ^ j ( l + 1 ) 2 ( Y i X i T β ^ j ( l ) ) T ( Y i X i T β ^ j ( l ) ) p ^ j ( l + 1 ) 2 + p ^ j ( l + 1 ) 2 ( Y i X i T β ^ j ( l ) ) T ( Y i X i T β ^ j ( l ) ) p ^ j ( l + 1 ) 2 1 ( Y i X i T β j ) T ( Y i X i T β j ) ( Y i X i T β ^ j ( l ) ) T ( Y i X i T β ^ j ( l ) ) .
For p λ 1 ( | β j t | ) , we apply a local quadratic approximation [22], then we have
p λ 1 ( | β j t | ) p λ 1 ( | β ^ j t ( l ) | ) + p λ 1 ( | β ^ j t ( l ) | ) 2 | β ^ j t ( l ) | ( β j t 2 β ^ j t ( l ) 2 ) .
Thus, for each β j , j = 1 , 2 , , m , we only need to solve the following minimization problem
β ^ j ( l + 1 ) = arg min β j i = 1 n p ^ i j ( l + 1 ) 1 2 σ ^ j ( l + 1 ) w ^ i j ( l ) ( Y i X i T β j ) T ( Y i X i T β j ) + n t = 1 p β j t 2 p λ 1 ( | β ^ j t ( l ) | ) 2 | β ^ j t ( l ) | ,
where w ^ i j ( l ) = p ^ j ( l + 1 ) 2 ( Y i X i T β ^ j ( l ) ) T ( Y i X i T β ^ j ( l ) ) p ^ j ( l + 1 ) 2 1 .
Thus, we can update β j as follows
β ^ j ( l + 1 ) = ( X B X T + A ) 1 X B Y ,
where
A = n d i a g p λ 1 ( | β ^ j 1 ( l ) | ) 2 | β ^ j 1 ( l ) | , , p λ 1 ( | β ^ j p ( l ) | ) 2 | β ^ j p ( l ) | , B = d i a g p ^ 1 j ( l + 1 ) 1 2 σ ^ j ( l + 1 ) w ^ 1 j ( l ) , p ^ 2 j ( l + 1 ) 1 2 σ ^ j ( l + 1 ) w ^ 2 j ( l ) , , p ^ n j ( l + 1 ) 1 2 σ ^ j ( l + 1 ) w ^ n j ( l ) .
Step 3 Repeat Step 1, and Step 2 until convergence.

4. Choice of the Tuning Parameters

The selection of tuning parameters is a vital part in the order selection and variable selection procedure. In order to guarantee that a true model can be chosen correctly, we should select the proper tuning parameters λ 1 and λ 2 in the process of practice. There are lots of methods to select λ 1 and λ 2 , such as cross-validation (CV), generalized cross-validation (GCV), AIC, and BIC.
As suggested in [24], we introduce a data-driven procedure to choose the tuning parameters λ 1 and λ 2 by minimizing the following modified Bayesian information criterion,
M B I C ( λ 1 , λ 2 ) = 2 i = 1 n log j = 1 m ^ π ^ j f p ^ j ( Y i X i T β ^ j ; 0 , σ ^ j ) + log n d f ,
where m ^ denotes the estimate of the number of components, d f = 3 m ^ 1 + M ^ β , and
M ^ β = # | β ^ j t | > 10 3 , j = 1 , , m ^ , t = 1 , , p .

5. Simulation

In this section, we use some numerical simulations to illustrate the finite sample performance of the proposed method. For the penalty function, we use the SCAD penalty [22], which is given as follows:
p λ ( t ; a ) = λ | t | , i f   | t | λ , ( t 2 2 a λ | t | + λ 2 ) / [ 2 ( a 1 ) ] , i f   λ < | t | a λ , ( a + 1 ) λ 2 / 2 , o t h e r w i s e ,
where λ is a tuning parameter and a > 2 . According to the suggestion in Fan and Li [22], a is equal to 3.7 by minimizing the Bayes risks. The datasets are generated via a three-component FMLR model
f ( y | x ) = j = 1 3 π j f p j ( y x T β j ; 0 , σ j ) ,
where the components of x are generated independently from the 7-dimensional standard normal distribution. In detail, we generate random samples of each component from the following linear model
Y = X T β + ϵ .
We simulate 100 datasets from the FMLR model (7) with sample size of n=200, 600, 800, 1000. The datasets are generated by the following four scenarios:
Scenario 1.  β 1 = ( 1 , 1 , 1 , 1 , 0 , 0 , 0 ) T , β 2 = ( 1 , 2 , 3 , 4 , 0 , 0 , 0 ) T , β 3 = ( 5 , 6 , 7 , 8 , 0 , 0 , 0 ) T and π 1 = 0.4 , π 2 = 0.3 , π 3 = 0.3 , and the random error ϵ N ( 0 , 1 ) ;
Scenario 2. We use the same setting as in Scenario 1, except that the error term follows a t-distribution with freedom degree 2;
Scenario 3. We use the same setting as in Scenario 1, except that the error term follows a mixture t distribution: ϵ 0.5 t ( 1 ) + 0.5 t ( 3 ) ;
Scenario 4. We use the same setting as in Scenario 1, except that the error term follows a mixture normal distribution: ϵ 0.95 N ( 0 , 1 ) + 0.05 N ( 0 . 5 2 ) .
We compare our proposed method with the method proposed by [17]. To assess the finite-sample performance, we consider four different measures:
(1)
R M S E π j : the root mean square error of π j ^ when the order is corrected estimated, which is defined by
R M S E π j = 1 M * m = 1 M * ( π ^ j m π j ) T ( π ^ j m π j )
where M * is the number of simulations with correct estimation of the order.
(2)
R M S E β c : the root mean square error of β ^ j , which can be similarly calculated as R M S E π j .
(3)
NCZ (the number of correct zeros): It denotes that the number of the true value of the parameter is zero and is correctly estimated as zero. NCZ can be calculated by
N C Z = # t : β t = 0 β ^ t = 0 ,
where # { A } denotes the number of elements within A.
(4)
NIZ (the number of incorrect zeros): It indicates that the number of the true value of the parameter is non-zero and is incorrectly estimated as zero. NIZ is given as follows:
N I Z = # t : β t 0 β ^ t = 0 .
In simulation studies, suppose we know that the data come from a mixture regression model with at most five components, but the true number of components should be estimated. For each scenario, the simulation is repeated 100 times. The corresponding results are shown in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6, Table 7 and Table 8. In these Tables, M 1 and M 2 denote the results by [17] and our proposed method, respectively.
Table 1 shows the simulation results of order selection. Columns labeled “Underfitted” are the proportion of the fitted model with less than three components in 100 simulations. Meanwhile, “Correctly fitted” and “Overfitted” can be similarly interpreted. From Table 1, we can find that the effects of the two models are very similar, and the accuracy rate of order selection can reach more than 98 % for M 1 and M 2 when n is larger than or equal to 600. Table 2 presents the results of variable selection and parameter estimation for each component. From Table 2, we observe that the finite sample performances of the two models are very similar for n 600 . Therefore, when the error term follows a normal distribution, the two models have similar performance when the sample size is sufficiently large.
Table 3 and Table 4 present the results of Scenario 2, which is a heavy-tailed scenario. We can observe from Table 3 that M 1 can only estimate about 20 % underfitted or overfitted model, while our method keeps robustness and continues to maintain 98 % accuracy when n 600 . In Table 4, M 1 has a poor performance in variable selection. M 1 has many non-zero NIZ, while our method’s NIZ is all zero for n 600 . Meanwhile, the NCZ of our proposed method increases as n increases. In addition, our proposed method has a smaller RMSE than M1.
For Table 5, the performance of order selection for M 1 is worse than that for M 2 . The ratio of the correctly fitted model remains above 98 % with our method for n 600 , while M 1 is easy to overfit the model’s components. In Table 6, it can be seen that the NCZ value of M 1 is a little better than that of M 2 . Compared with R M S E β c , we can find that our method is better than M 1 consistently.
Table 7 and Table 8 present the results of Scenario 4. M 1 absolutely stays away from the right number of components. On the contrary, our method can select the correct number of components with 98 % accuracy for n 600 . In Table 8, M 1 is better than M 2 in NCZ, but M 1 is unstable in NIZ. Comparing R M S E β c , we can find that M 1 is larger than M 2 . In general, our model is better than M 1 in both order selection and variable selection and parameter estimation.

6. Real Data Analysis

In this section, we apply the proposed methodology to analyze baseball salary data, which consists of information about major league baseball players. The response variable is their 1992 salaries (measured in thousands of dollars). In addition, there are 16 performance measures for 337 MLB players who participated in at least one game in both the 1991 and 1992 seasons. This data set has been analyzed by others, such as [12,17]. We want to study how the performance measures affect salaries using our method.
The performance measures are batting average ( x 1 ) , on-base percentage ( x 2 ) , runs ( x 3 ) , hits ( x 4 ) , doubles ( x 5 ) , triples ( x 6 ) , home runs ( x 7 ) , runs batted in ( x 8 ) , walks ( x 9 ) , strikeouts ( x 10 ) , stolen bases ( x 11 ) , and errors ( x 12 ) ; and indicators of free agency eligibility ( x 13 ) , free agent in 1991/2 ( x 14 ) , arbitration eligibility ( x 15 ) , and arbitration in 1991/2 ( x 16 ) . The four (dummy) variables x 13 x 16 indicate how free each player was to move to another team. As suggested in [25], the interaction effects between (dummy) variables x 13 x 16 and the quantitative variables x 1 , x 3 , x 7 , and x 8 should be added to the consideration. Therefore, we obtain a set of 32 potential covariates affecting each player’s salary. Ref. [12] fitted a mixture of linear regression models with two or three components to depict the overlaid shape of the histogram of log(salary), and concluded that a two-component mixture regression model labeled MIXSCAD fitted the data well. [17] uses an FMLR model based on normal distribution, and the number of components is two.
As advocated by [12], we use log(salary) as the response variable. We first fit a linear model via stepwise regression, the results are shown in Table 9, denoted as β ^ o l s . Based on [17], we consider the following four-component mixture model,
Y | X j = 1 4 π j f p j ( Y X T β j ; 0 , σ j ) ,
where Y = log(salary) and X is a 33 × 1 vector containing 32 covariates plus an intercept term. In order to implement the proposed modified EM algorithm, we set the initial values as follows
π 0 = ( 0.4 , 0.2 , 0.2 , 0.2 ) T , σ 0 = ( 10 , 10 , 10 , 10 ) T , p 0 = ( 1 , 1 , 1 , 1 ) T , β j 0 = β ^ o l s + ϵ j
where ϵ j N ( 0 , I ) , j = 1 , 2 , 3 , 4 . The results are reported in Table 9 and Table 10. From Table 9, we find that both M 1 and M 2 choose two components. Furthermore, we can observe from Table 10 that M2 has a smaller BIC value than M1, which indicates that our proposed method can better fit this dataset than M1.
Of interest is to explain how the performance measures affect salaries by interpreting the outcome of the fit, although it can be a source of controversy. Do not think about it; there should be many positive correlations between a baseball player and his salary. M 1 and M 2 have the same sign and approximate coefficients in x 0 , x 13 , x 15 , and interactions of x 8 and x 14 . Recall that x 1 and x 7 are individual performances, while x 13 , x 15 , and x 16 are three dummy variables indicating how freely players change teams. For example, the effect of x 1 x 16 implies that for most players having arbitration eligibility in 1991/2 enhances the individual ability ( x 1 ) toward a lower salary, but not the value of their team contribution ( x 8 ) .
The main differences between the two models are interaction effects x 1 x 14 and x 1 x 15 . This implies that M 1 disregards x 1 x 14 ’s effect, but M 2 indicates that it is in two directions. And M 2 attaches great importance to the interaction effect of x 1 x 14 .

7. Discussion

In this paper, we introduced the FMLR models via an exponential power distribution. Under some conditions, the asymptotic properties of the proposed estimators were established. Meanwhile, a modified EM algorithm and an MM algorithm were applied to solve the proposed optimization problem. Furthermore, the merits of our proposed methodology were illustrated through some numerical simulations and real data analysis. Simulation studies showed that the proposed method had better performance than the existing methods under difference errors. By analyzing a baseball salary dataset, our proposed method had a smaller BIC value than the method proposed [17].

Author Contributions

Methodology, Y.J. and J.L.; Formal analysis, H.Z. and X.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research is partially supported by NSFC (12171203), the Fundamental Research Funds for the Central Universities (23JNQMX21) and the Natural Science Foundation of Guangdong (2022A1515010045).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data used in this study is publicly available. Code is available on request from the second authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Proof of Theorem 1.
For any given ϵ > 0 , let u = M ϵ . Denote
Γ n ( u ) = Q ˜ n ( θ 0 + u / n ) Q ˜ n ( θ 0 ) .
According to (2), we have
Γ n ( u ) = [ Q n ( θ 0 + u / n ) Q n ( θ 0 ) ] [ P n 1 ( θ 0 + u / n ) P n 1 ( θ 0 ) ] [ P n 2 ( θ 0 + u / n ) P n 2 ( θ 0 ) ] .
Under condition (C1), we have p λ ( 0 ) = 0 for any λ . Therefore, P n 1 ( θ 0 ) = P n 1 ( θ 01 ) and P n 2 ( θ 0 ) = P n 2 ( θ 01 ) . Since P n 1 ( θ 0 + u / n ) and P n 2 ( θ 0 + u / n ) are a sum of positive terms, we then have
Γ n ( u ) [ Q n ( θ 0 + u / n ) Q n ( θ 0 ) ] [ P n 1 ( θ 01 + u 1 / n ) P n 1 ( θ 01 ) ] [ P n 2 ( θ 01 + u 1 / n ) P n 2 ( θ 01 ) ] ,
where u 1 is a subvector of u with the corresponding non-zero coefficients.
By conditions (C4), (C5), (C7) and (C8), and the Taylor’s expansion, we have
Q n ( θ 0 + u / n ) Q n ( θ 0 ) = n 1 / 2 Q n ( θ 0 ) T u 1 2 u T Q ( θ 0 ) u ( 1 + o p ( 1 ) ) .
By condition (C1), the Taylor’s expansion, triangular inequality, and Cauchy–Schwarz inequality, we have
P n 1 ( θ 01 + u 1 / n ) P n 1 ( θ 01 ) = n k = 1 m 0 j = 1 t k p λ 1 ( | β k j + u k j / n | ) p λ 1 ( | u k j / n | ) = m 0 t a n u + b n 2 u 2 ( 1 + o ( 1 ) ) ,
where m 0 is the number of true non-zero mixing weights, and t = max k t k , and t k is the number of true non-zero regression coefficients in the k-th component.
Since n λ 2 , and λ 2 0 , we have
| P n 2 ( θ 01 + u 1 / n ) P n 2 ( θ 01 ) | = 0 .
Regularity condition (C6) implies that n 1 / 2 Q n ( θ 0 ) = O p ( 1 ) . Since n min { λ 1 , λ 2 } , and min { λ 1 , λ 2 } 0 , we have a n = 0 . By conditions (C2) and (C6), for any given ϵ > 0 , there exists a sufficiently large M ϵ such that
lim n P sup u = M ϵ Γ n ( u ) < 0 1 ϵ .
Therefore, with large probability, there is a local maximum in { θ + u / n : u M ϵ } . That is to say, this local maximizer θ ^ n satisfies θ ^ n θ 0 = O p ( 1 / n ) . This completes the proof of Theorem 1. □
Proof of Theorem 2.
We first show that π ^ k = 0 for k = m 0 + 1 , , m . Since θ ^ n θ = O p ( n 1 / 2 ) , we have π ^ k = O p ( 1 / n ) for k = m 0 + 1 , , m . To prove (a), it is sufficient to show with probability tending to 1 as n for any π k satisfying π ^ k π k = O p ( 1 / n ) and k = m 0 + 1 , , m
Q * ( θ ) π ^ k < 0 f o r π ^ k < C / n ,
where C is a positive constant number,
Q * ( θ ) = Q ˜ n ( θ ) δ k = 1 m π k 1 ,
and δ is a Lagrange multiplier. Therefore, π ^ k , k = 1 , , m should satisfy
Q * ( θ ) π ^ k = i = 1 n f p j ( Y i X i T β j ; 0 , σ j ) j = 1 m π ^ j f p j ( Y i X i T β j ; 0 , σ j ) n λ 2 p λ 2 ( π ^ k ) C 0 + p λ 2 ( π ^ k ) δ = 0 .
We first consider k m 0 . By the law of large numbers, we have
i = 1 n f p j ( Y i X i T β j ; 0 , σ j ) j = 1 m π ^ j f p j ( Y i X i T β j ; 0 , σ j ) = O p ( n ) .
For k m 0 , we have π ^ k = π k 0 + O p ( 1 / n ) > 1 2 min { π 1 0 , , π m 0 0 } . Since n λ 2 = o p ( n ) , p λ 2 ( π ^ k ) = o p ( 1 ) and p λ 2 ( π ^ k ) = o p ( n ) , then we have
n λ 2 p λ 2 ( π ^ k ) C 0 + p λ 2 ( π ^ k ) = o p ( 1 ) .
By (A2)–(A4), we have δ = O p ( n ) . For k m 0 + 1 and π ^ k < C / n , we have π ^ k = O p ( 1 / n ) . By n λ 2 , C 0 is sufficient small and p λ ( · ) is the SCAD penalty, we have
n λ 2 p λ 2 ( π ^ k ) C 0 + p λ 2 ( π ^ k ) / n = λ 2 2 C 0 + λ 2 π ^ k = O p ( n λ 2 ) .
Therefore, the first term and the third term in the Equation (A2) are dominated by the second term. Thus, we prove the Equation (A1). This completes the proof of (a).
To prove (b), for any θ with m 0 components, we split θ m 0 = ( θ m 0 1 , θ m 0 2 ) for any θ m 0 in the neighborhood | | θ m 0 θ m 0 0 | | = O p ( 1 / n ) such that θ m 0 2 contains all zero effects, e.g., β k j = 0 , k = 1 , , m 0 and j = 1 , , t k . By (2), we have
Q ˜ n { ( θ m 0 1 , θ m 0 2 ) } Q ˜ n { ( θ m 0 1 , 0 ) } = [ Q n { ( θ m 0 1 , θ m 0 2 ) } Q n { ( θ m 0 1 , 0 ) } ] [ P n 1 { ( θ m 0 1 , θ m 0 2 ) } P n 1 { ( θ m 0 1 , 0 ) } ] = [ Q n { ( θ m 0 1 , θ m 0 2 ) } Q n { ( θ m 0 1 , 0 ) } ] n k = 1 m 0 j = t k + 1 p p λ 1 ( | β k j | ) .
According to the mean value theorem, we have
Q n { ( θ m 0 1 , θ m 0 2 ) } Q n { ( θ m 0 1 , 0 ) } = Q n { ( θ m 0 1 , γ ) } θ m 0 2 T θ m 0 2 ,
where | | γ | | | | θ m 0 2 | | = O ( n 1 / 2 ) . Since
Q n { ( θ m 0 1 , γ ) } θ m 0 2 Q n { ( θ m 0 01 , 0 ) } θ m 0 2 Q n { ( θ m 0 1 , γ ) } θ m 0 2 Q n { ( θ m 0 1 , 0 ) } θ m 0 2 + Q n { ( θ m 0 1 , 0 ) } θ m 0 2 Q n { ( θ m 0 01 , 0 ) } θ m 0 2 i = 1 n R 1 ( z i ) | | γ | | + i = 1 n R 1 ( z i ) | | θ m 0 1 θ m 0 01 | | = ( | | γ | | + | | θ m 0 1 θ m 0 01 | | ) O p ( n ) = O p ( n 1 / 2 ) ,
and
Q n { ( θ m 0 01 , 0 ) } θ m 0 2 = O p ( n 1 / 2 ) ,
we have
Q n { ( θ m 0 1 , γ ) } θ m 0 2 = O p ( n 1 / 2 ) .
By (A5) and (A6), we have
Q n { ( θ m 0 1 , θ m 0 2 ) } Q n { ( θ m 0 1 , 0 ) } = O p ( n 1 / 2 ) k = 1 m 0 j = t k + 1 p | β k j | .
Thus, we have
Q ˜ n { ( θ m 0 1 , θ m 0 2 ) } Q ˜ n { ( θ m 0 1 , 0 ) } = k = 1 m 0 j = t k + 1 p O p ( n 1 / 2 ) | β k j | n p λ 1 ( | β k j | ) .
By condition (C3), for | t | n 1 / 2 log n , we have O p ( n 1 / 2 ) | t | < n p λ 1 ( | t | ) . Therefore, we can obtain
Q ˜ n { ( θ m 0 1 , θ m 0 2 ) } Q ˜ n { ( θ m 0 1 , 0 ) } < 0 .
By (A7), with probability tending to 1 as n , we have
Q ˜ n { ( θ m 0 1 , θ m 0 2 ) } Q ˜ n { ( θ ^ m 0 1 , 0 ) } = [ Q ˜ n { ( θ m 0 1 , θ m 0 2 ) } Q ˜ n { ( θ m 0 1 , 0 ) } ] + [ Q ˜ n { ( θ m 0 1 , 0 ) } Q ˜ n { ( θ ^ m 0 1 , 0 ) } ] < 0 .
Thus, this completes the proof of part (b).
Using the result of Theorem 1, there exists a n -consistent local maximizer θ ^ n 1 of Q ˜ n { ( θ 1 , 0 ) } such that θ ^ n = ( θ ^ n 1 , 0 ) satisfies
Q ˜ n ( θ ^ n ) θ 1 = Q n ( θ ) θ 1 P n 1 ( θ ) θ 1 P n 2 ( θ ) θ 1 θ = θ ^ n = 0 .
By the Taylor’s expansion, we have
Q n ( θ ) θ 1 θ = θ ^ n = Q n ( θ 01 ) θ 1 + 2 Q n ( θ 01 ) θ 1 θ 1 T + o p ( n ) ( θ ^ n 1 θ 01 ) ,
P n 1 ( θ ) θ 1 θ = θ ^ n = P n 1 ( θ 01 ) + P n 1 ( θ 01 ) + o p ( n ) ( θ ^ n 1 θ 01 ) ,
P n 2 ( θ ) θ 1 θ = θ ^ n = P n 2 ( θ 01 ) + P n 2 ( θ 01 ) + o p ( n ) ( θ ^ n 1 θ 01 ) .
By substituting Equations (A9)–(A11) into (A8), we have
2 Q n ( θ 01 ) θ 1 θ 1 T P n 1 ( θ 01 ) P n 2 ( θ 01 ) + o p ( n ) ( θ ^ n 1 θ 01 ) = Q n ( θ 01 ) θ 1 P n 1 ( θ 01 ) P n 2 ( θ 01 ) .
By the conditions (C4), (C5), and (C6), we have
1 n 2 Q n ( θ 01 ) θ 1 θ 1 T = Q 1 ( θ 01 ) + o p ( 1 ) ,
1 n Q n ( θ 01 ) θ 1 D N ( 0 , Q 1 ( θ 01 ) ) .
By Slutsky’s theorem, we have
n Q 1 ( θ 01 ) P n 1 ( θ 01 ) n P n 2 ( θ 01 ) n ( θ ^ n 1 θ 01 ) + P n 1 ( θ 01 ) n + P n 2 ( θ 01 ) n D N ( 0 , Q 1 ( θ 01 ) ) .
This completes the proof of part (c). □

References

  1. Quandt, R.E. A new approach to estimating switching regressions. J. Am. Stat. Assoc. 1972, 67, 306–310. [Google Scholar] [CrossRef]
  2. Goldfeld, S.M.; Quandt, R.E. A Markov model for switching regressions. J. Econom. 1973, 1, 3–15. [Google Scholar] [CrossRef]
  3. Jacobs, R.A.; Jordan, M.I.; Nowlan, S.J.; Hinton, G.E. Adaptive mixtures of local experts. Neural Comput. 1991, 3, 79–87. [Google Scholar] [CrossRef] [PubMed]
  4. Wedel, M.; Kamakura, W.A. Market Segmentation: Conceptual and Methodological Foundations; Springer Science & Business Media: Berlin, Germany, 2000. [Google Scholar]
  5. Skrondal, A.; Rabe-Hesketh, S. Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models; Chapman and Hall/CRC: Boca Raton, FL, USA, 2004. [Google Scholar]
  6. Peel, D.; MacLahlan, G. Finite Mixture Models; John & Sons: Toronto, ON, Canada, 2000. [Google Scholar]
  7. McLachlan, G.J.; Lee, S.X.; Rathnayake, S.I. Finite mixture models. Annu. Rev. Stat. Appl. 2019, 6, 355–378. [Google Scholar] [CrossRef]
  8. Yu, C.; Yao, W.; Yang, G. A selective overview and comparison of robust mixture regression estimators. Int. Stat. Rev. 2020, 88, 176–202. [Google Scholar] [CrossRef]
  9. Chen, J.; Khalili, A. Order selection in finite mixture models with a nonsmooth penalty. J. Am. Stat. Assoc. 2009, 104, 187–196. [Google Scholar] [CrossRef]
  10. Peng, H.; Huang, T.; Zhang, K. Model Selection for Gaussian Mixture Models. Stat. Sin. 2017, 27, 147–169. [Google Scholar]
  11. Wang, P.; Puterman, M.L.; Cockburn, I.; Le, N. Mixed Poisson regression models with covariate dependent rates. Biometrics 1996, 52, 381–400. [Google Scholar] [CrossRef]
  12. Khalili, A.; Chen, J. Variable selection in finite mixture of regression models. J. Am. Stat. Assoc. 2007, 102, 1025–1038. [Google Scholar] [CrossRef]
  13. Jiang, Y. Robust variable selection for mixture linear regression models. Hacet. J. Math. Stat. 2016, 45, 549–559. [Google Scholar] [CrossRef]
  14. Luo, R.; Wang, H.; Tsai, C.L. On mixture regression shrinkage and selection via the MR-Lasso. Int. J. Pure Appl. Math. 2008, 46, 403–414. [Google Scholar]
  15. Jiang, Y.; Huang, M.; Wei, X.; Tonghua, H.; Hang, Z. Robust mixture regression via an asymmetric exponential power distribution. Commun. Stat.-Simul. Comput. 2022, 1–12. [Google Scholar] [CrossRef]
  16. Wang, X.; Feng, Z. Component selection for exponential power mixture models. J. Appl. Stat. 2023, 50, 291–314. [Google Scholar] [CrossRef] [PubMed]
  17. Yu, C.; Wang, X. A new model selection procedure for finite mixture regression models. Commun. Stat.-Theory Methods 2020, 49, 4347–4366. [Google Scholar] [CrossRef]
  18. Chen, X. Robust mixture regression with Exponential Power distribution. arXiv 2020, arXiv:2012.10637. [Google Scholar]
  19. Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 1977, 39, 1–22. [Google Scholar] [CrossRef]
  20. Hunter, D.R.; Lange, K. A tutorial on MM algorithms. Am. Stat. 2004, 58, 30–37. [Google Scholar] [CrossRef]
  21. Kobayashi, G. Skew exponential power stochastic volatility model for analysis of skewness, non-normal tails, quantiles and expectiles. Comput. Stat. 2016, 31, 49–88. [Google Scholar] [CrossRef]
  22. Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  23. Zou, H.; Li, R. One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat. 2008, 36, 1509–1533. [Google Scholar]
  24. Wang, H.; Li, R.; Tsai, C.L. Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 2007, 94, 553–568. [Google Scholar] [CrossRef]
  25. Watnik, M.R. Pay for play: Are baseball salaries based on performance? J. Stat. Educ. 1998, 6, 1–5. [Google Scholar] [CrossRef]
Table 1. Order selection results in Scenario 1.
Table 1. Order selection results in Scenario 1.
n M 1 M 2
UnderfittedCorrectly FittedOverfittedUnderfittedCorrectly FittedOverfitted
2000.000.990.010.400.600.00
6000.000.990.010.000.990.01
8000.000.990.010.000.990.01
10000.001.000.000.000.990.01
Table 2. Variable selection and parameter estimation results in Scenario 1.
Table 2. Variable selection and parameter estimation results in Scenario 1.
n M 1 M 2
RMSE π c RMSE β c NCZ NIZ RMSE π c RMSE β c NCZ NIZ
2000.0920.5352.9000.2000.1430.3522.6670.000
0.0480.5962.7000.0000.1410.2072.3330.000
0.0730.7032.6700.0000.0480.4272.8330.000
6000.0240.1542.9900.0000.0230.1562.9900.000
0.0250.1532.9800.0000.0250.1512.9900.000
0.0210.1542.9900.0000.0220.1562.9800.000
8000.0220.1422.9900.0000.0200.1453.0000.000
0.0200.1382.9800.0000.0210.1533.0000.000
0.0190.1412.9900.0000.0200.1383.0000.000
10000.0140.1232.9900.0000.0150.1303.0000.000
0.0160.1223.0000.0000.0140.1213.0000.000
0.0140.1213.0000.0000.0140.1223.0000.000
Table 3. Order selection results in Scenario 2.
Table 3. Order selection results in Scenario 2.
n M 1 M 2
Underfitted Correctly Fitted Overfitted Underfitted Correctly Fitted Overfitted
2000.500.200.300.160.640.20
6000.070.810.120.000.990.01
8000.030.750.220.010.980.01
10000.110.840.050.000.990.01
Table 4. Variable selection and parameter estimation results in Scenario 2.
Table 4. Variable selection and parameter estimation results in Scenario 2.
n M 1 M 2
RMSE π c RMSE β c NCZ NIZ RMSE π c RMSE β c NCZ NIZ
2000.2858.3812.5000.0000.0880.9642.7220.056
0.1260.4571.5000.0000.0581.9572.7780.076
0.2232.8762.0002.0000.0900.8522.7720.000
6000.0570.6712.8930.0000.0480.2612.9630.000
0.0621.1192.8440.0110.0400.2402.8760.000
0.0551.2642.8720.0340.0440.2642.8960.000
8000.0650.7152.8970.0130.0340.2232.8450.000
0.0530.8742.8920.0120.0320.2282.9570.000
0.0471.2412.8870.0000.0330.1932.9290.000
10000.0630.9052.9120.0120.0290.1982.9060.000
0.0560.9262.9230.0110.0310.1912.9460.000
0.0470.8372.9210.0120.0330.1882.9790.000
Table 5. Order selection results in Scenario 3.
Table 5. Order selection results in Scenario 3.
n M 1 M 2
Underfitted Correctly Fitted Overfitted Underfitted Correctly Fitted Overfitted
2000.600.250.150.320.660.02
6000.000.740.260.000.980.02
8000.030.730.240.000.990.01
10000.050.790.160.000.990.01
Table 6. Variable selection and parameter estimation results in Scenario 3.
Table 6. Variable selection and parameter estimation results in Scenario 3.
n M 1 M 2
RMSE π c RMSE β c NCZ NIZ RMSE π c RMSE β c NCZ NIZ
2000.2362.3122.0000.0000.0390.7663.0000.500
0.1653.9032.4000.2000.0540.9693.0000.167
0.0601.7322.6001.4000.0490.9293.0000.667
6000.0250.1642.8870.0000.0230.1222.8740.000
0.0250.1562.8890.0000.0240.1272.8690.000
0.0270.1622.8960.0000.0250.1242.8770.000
8000.0240.1542.8930.0000.0180.1142.8780.000
0.0230.1372.8860.0000.0170.1232.9310.000
0.0190.1342.8970.0000.0170.1232.9310.000
10000.0210.1322.8940.0000.0170.0972.9240.000
0.0200.1382.9240.0000.0170.0972.9240.000
0.0190.1222.8910.0000.0160.1142.9710.000
Table 7. Order selection results in Scenario 4.
Table 7. Order selection results in Scenario 4.
n M 1 M 2
Underfitted Correctly Fitted Overfitted Underfitted Correctly Fitted Overfitted
2000.490.410.100.070.720.21
6000.130.280.590.020.980.00
8000.190.310.500.001.000.00
10000.100.390.510.010.990.00
Table 8. Variable selection and parameter estimation results in Scenario 4.
Table 8. Variable selection and parameter estimation results in Scenario 4.
n M 1 M 2
RMSE π c RMSE β c NCZ NIZ RMSE π c RMSE β c NCZ NIZ
2000.0145.4552.8890.4440.2180.9712.5000.000
0.1801.3372.8890.2220.2281.6692.0000.000
0.0792.9562.8890.0000.3202.2842.2500.000
6000.0502.6722.8320.0000.0540.3722.7760.000
0.0531.2562.7170.0000.0550.2732.7240.000
0.0542.1362.8460.0380.0610.2712.8780.000
8000.0511.1342.8110.0000.0350.3982.6000.000
0.0451.5352.6230.0000.0390.1632.6000.000
0.0472.7242.7470.0000.0220.1833.0000.000
10000.0741.2172.9420.0000.0350.3472.9730.000
0.0731.7362.9740.1030.0310.2202.6970.000
0.0473.7342.7720.0000.0250.1292.9490.000
Table 9. Parameter estimates for baseball salary data.
Table 9. Parameter estimates for baseball salary data.
CovariatesLinear ModelM1M2
Comp1 Comp2 Comp1 Comp2
x 0 5.484.815.664.704.67
x 1 -----
x 2 −1.54----
x 3 -----
x 4 --0.010.030.02
x 5 ----0.01
x 6 -----
x 7 -----
x 8 0.010.010.020.01-
x 9 0.01--0.030.01
x 10 −0.01----
x 11 -0.03--0.01
x 12 -----
x 13 1.522.04-3.132.16
x 14 -0.48----
x 15 1.351.60-2.731.28
x 16 ---0.011.40
x 1 x 13 -----
x 1 x 14 ----10.05
x 1 x 15 ---0.01-
x 1 x 16 −4.38----
x 3 x 13 -----
x 3 x 14 ---−0.01−0.02
x 3 x 15 -----
x 3 x 16 ---0.01-
x 7 x 13 0.01--0.03-
x 7 x 14 0.03---0.02
x 7 x 15 -----
x 7 x 16 -----
x 8 x 13 --0.01-0.01
x 8 x 14 -0.01-0.01-
x 8 x 15 --0.02--
x 8 x 16 0.02---0.02
Table 10. Model parameter estimates for baseball salary data.
Table 10. Model parameter estimates for baseball salary data.
Parameter M 1 M 2
π ^ 0.690.310.840.16
p ^ 221.051.49
σ ^ --2.277.13
λ 1 0.3000.220
λ 2 0.0400.016
M B I C 569.64547.25
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jiang, Y.; Liu, J.; Zou, H.; Huang, X. Model Selection for Exponential Power Mixture Regression Models. Entropy 2024, 26, 422. https://doi.org/10.3390/e26050422

AMA Style

Jiang Y, Liu J, Zou H, Huang X. Model Selection for Exponential Power Mixture Regression Models. Entropy. 2024; 26(5):422. https://doi.org/10.3390/e26050422

Chicago/Turabian Style

Jiang, Yunlu, Jiangchuan Liu, Hang Zou, and Xiaowen Huang. 2024. "Model Selection for Exponential Power Mixture Regression Models" Entropy 26, no. 5: 422. https://doi.org/10.3390/e26050422

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop